roberta-tiny-10M

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.7391
Accuracy: 0.5148

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 16
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 32
total_train_batch_size: 512
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
num_epochs: 100.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
7.8031	1.04	50	7.3560	0.0606
7.1948	2.08	100	6.7374	0.1182
6.8927	3.12	150	6.5022	0.1415
6.7339	4.16	200	6.4005	0.1483
6.6609	5.21	250	6.3535	0.1510
6.1972	6.25	300	6.3324	0.1519
6.1685	7.29	350	6.3029	0.1528
6.1302	8.33	400	6.2828	0.1521
6.093	9.37	450	6.2568	0.1536
6.0543	10.41	500	6.2430	0.1544
6.0479	11.45	550	6.2346	0.1541
6.0372	12.49	600	6.2232	0.1546
6.0127	13.53	650	6.2139	0.1541
5.968	14.58	700	6.2053	0.1547
5.9635	15.62	750	6.1996	0.1549
5.9479	16.66	800	6.1953	0.1548
5.9371	17.7	850	6.1887	0.1545
5.9046	18.74	900	6.1613	0.1545
5.8368	19.78	950	6.0952	0.1557
5.7914	20.82	1000	6.0330	0.1569
5.7026	21.86	1050	5.9430	0.1612
5.491	22.9	1100	5.6100	0.1974
4.9289	23.95	1150	4.9607	0.2702
4.5214	24.99	1200	4.5795	0.3051
4.5663	26.04	1250	4.3454	0.3265
4.3717	27.08	1300	4.1738	0.3412
4.1483	28.12	1350	4.0336	0.3555
3.9988	29.16	1400	3.9180	0.3677
3.8695	30.21	1450	3.8108	0.3782
3.5017	31.25	1500	3.7240	0.3879
3.4311	32.29	1550	3.6426	0.3974
3.3517	33.33	1600	3.5615	0.4068
3.2856	34.37	1650	3.4915	0.4156
3.227	35.41	1700	3.4179	0.4255
3.1675	36.45	1750	3.3636	0.4325
3.0908	37.49	1800	3.3083	0.4394
3.0561	38.53	1850	3.2572	0.4473
3.0139	39.58	1900	3.2159	0.4525
2.9837	40.62	1950	3.1789	0.4575
2.9387	41.66	2000	3.1431	0.4618
2.9034	42.7	2050	3.1163	0.4654
2.8822	43.74	2100	3.0842	0.4694
2.836	44.78	2150	3.0583	0.4727
2.8129	45.82	2200	3.0359	0.4760
2.7733	46.86	2250	3.0173	0.4776
2.7589	47.9	2300	2.9978	0.4812
2.7378	48.95	2350	2.9788	0.4831
2.7138	49.99	2400	2.9674	0.4844
2.8692	51.04	2450	2.9476	0.4874
2.8462	52.08	2500	2.9342	0.4893
2.8312	53.12	2550	2.9269	0.4900
2.7834	54.16	2600	2.9111	0.4917
2.7822	55.21	2650	2.8987	0.4934
2.584	56.25	2700	2.8844	0.4949
2.5668	57.29	2750	2.8808	0.4965
2.5536	58.33	2800	2.8640	0.4982
2.5403	59.37	2850	2.8606	0.4982
2.5294	60.41	2900	2.8441	0.5008
2.513	61.45	2950	2.8402	0.5013
2.5105	62.49	3000	2.8316	0.5022
2.4897	63.53	3050	2.8237	0.5027
2.4974	64.58	3100	2.8187	0.5040
2.4799	65.62	3150	2.8129	0.5044
2.4741	66.66	3200	2.8056	0.5057
2.4582	67.7	3250	2.8025	0.5061
2.4389	68.74	3300	2.7913	0.5076
2.4539	69.78	3350	2.7881	0.5072
2.4252	70.82	3400	2.7884	0.5082
2.4287	71.86	3450	2.7784	0.5093
2.4131	72.9	3500	2.7782	0.5099
2.4016	73.95	3550	2.7724	0.5098
2.3998	74.99	3600	2.7659	0.5111
2.5475	76.04	3650	2.7650	0.5108
2.5443	77.08	3700	2.7620	0.5117
2.5381	78.12	3750	2.7631	0.5115
2.5269	79.16	3800	2.7578	0.5122
2.5288	80.21	3850	2.7540	0.5124
2.3669	81.25	3900	2.7529	0.5125
2.3631	82.29	3950	2.7498	0.5132
2.3499	83.33	4000	2.7454	0.5136
2.3726	84.37	4050	2.7446	0.5141
2.3411	85.41	4100	2.7403	0.5144
2.3321	86.45	4150	2.7372	0.5146
2.3456	87.49	4200	2.7389	0.5146
2.3372	88.53	4250	2.7384	0.5151
2.343	89.58	4300	2.7398	0.5144

Framework versions

Transformers 4.24.0
Pytorch 1.11.0+cu113
Datasets 2.6.1
Tokenizers 0.12.1

g8a9
/

roberta-tiny-10M

roberta-tiny-10M

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results