roberta-tiny-2l-10M

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.1695
Accuracy: 0.4534

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0004
train_batch_size: 16
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 32
total_train_batch_size: 512
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
num_epochs: 100.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
7.7619	1.04	50	7.2338	0.0748
7.0524	2.08	100	6.6252	0.1331
6.8423	3.12	150	6.4622	0.1463
6.7298	4.16	200	6.3971	0.1488
6.669	5.21	250	6.3628	0.1519
6.2038	6.25	300	6.3371	0.1518
6.1783	7.29	350	6.3115	0.1532
6.1459	8.33	400	6.2922	0.1530
6.1096	9.37	450	6.2696	0.1536
6.0745	10.41	500	6.2545	0.1541
6.0689	11.45	550	6.2496	0.1533
6.0562	12.49	600	6.2313	0.1542
6.0324	13.53	650	6.2248	0.1536
5.9907	14.58	700	6.2179	0.1544
5.9683	15.62	750	6.1832	0.1545
5.9236	16.66	800	6.1413	0.1550
5.8808	17.7	850	6.0900	0.1558
5.8392	18.74	900	6.0543	0.1566
5.7962	19.78	950	6.0222	0.1575
5.7473	20.82	1000	5.9471	0.1617
5.5787	21.86	1050	5.7038	0.1891
5.2316	22.9	1100	5.2708	0.2382
4.6613	23.95	1150	4.7075	0.2975
4.3006	24.99	1200	4.4180	0.3222
4.3754	26.04	1250	4.2383	0.3385
4.2531	27.08	1300	4.1157	0.3491
4.0987	28.12	1350	4.0197	0.3578
4.0045	29.16	1400	3.9504	0.3656
3.9145	30.21	1450	3.8819	0.3718
3.5808	31.25	1500	3.8279	0.3781
3.5354	32.29	1550	3.7830	0.3826
3.4788	33.33	1600	3.7400	0.3872
3.4315	34.37	1650	3.7028	0.3911
3.3906	35.41	1700	3.6629	0.3956
3.3508	36.45	1750	3.6344	0.3984
3.288	37.49	1800	3.6046	0.4019
3.2678	38.53	1850	3.5799	0.4053
3.2382	39.58	1900	3.5549	0.4074
3.2151	40.62	1950	3.5285	0.4103
3.1777	41.66	2000	3.5069	0.4132
3.1499	42.7	2050	3.4917	0.4150
3.131	43.74	2100	3.4701	0.4168
3.0942	44.78	2150	3.4530	0.4189
3.0683	45.82	2200	3.4320	0.4212
3.0363	46.86	2250	3.4195	0.4227
3.0264	47.9	2300	3.4046	0.4249
3.0079	48.95	2350	3.3874	0.4267
2.9869	49.99	2400	3.3792	0.4277
3.1592	51.04	2450	3.3655	0.4289
3.1353	52.08	2500	3.3548	0.4310
3.1257	53.12	2550	3.3489	0.4308
3.0822	54.16	2600	3.3353	0.4327
3.0771	55.21	2650	3.3220	0.4341
2.8639	56.25	2700	3.3119	0.4354
2.8477	57.29	2750	3.3104	0.4360
2.8373	58.33	2800	3.2954	0.4378
2.818	59.37	2850	3.2935	0.4381
2.8137	60.41	2900	3.2786	0.4394
2.7985	61.45	2950	3.2747	0.4401
2.7936	62.49	3000	3.2668	0.4411
2.7764	63.53	3050	3.2569	0.4419
2.7819	64.58	3100	3.2492	0.4434
2.7672	65.62	3150	3.2494	0.4433
2.7629	66.66	3200	3.2410	0.4443
2.747	67.7	3250	3.2368	0.4446
2.7303	68.74	3300	3.2246	0.4460
2.7461	69.78	3350	3.2212	0.4462
2.7179	70.82	3400	3.2217	0.4470
2.7184	71.86	3450	3.2132	0.4479
2.7077	72.9	3500	3.2086	0.4487
2.6916	73.95	3550	3.2057	0.4482
2.6934	74.99	3600	3.2010	0.4495
2.8585	76.04	3650	3.1980	0.4497
2.8559	77.08	3700	3.1940	0.4503
2.8519	78.12	3750	3.1940	0.4506
2.8391	79.16	3800	3.1897	0.4509
2.845	80.21	3850	3.1858	0.4510
2.6636	81.25	3900	3.1819	0.4518
2.6569	82.29	3950	3.1834	0.4517
2.647	83.33	4000	3.1798	0.4517
2.6665	84.37	4050	3.1786	0.4525
2.6382	85.41	4100	3.1733	0.4525
2.6346	86.45	4150	3.1700	0.4532
2.6457	87.49	4200	3.1714	0.4529
2.6328	88.53	4250	3.1686	0.4537
2.6429	89.58	4300	3.1715	0.4534
2.6369	90.62	4350	3.1687	0.4538
2.628	91.66	4400	3.1651	0.4539
2.6373	92.7	4450	3.1660	0.4539
2.6357	93.74	4500	3.1662	0.4537
2.6302	94.78	4550	3.1695	0.4533

Framework versions

Transformers 4.24.0
Pytorch 1.11.0+cu113
Datasets 2.6.1
Tokenizers 0.12.1

g8a9
/

roberta-tiny-2l-10M

roberta-tiny-2l-10M

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results