collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1114
Num Input Tokens Seen: 21798600

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.5454	0.0129	5	1.3798	284096
1.5595	0.0258	10	1.2917	565048
1.4801	0.0388	15	1.2113	857320
1.2583	0.0517	20	1.1640	1143680
1.2536	0.0646	25	1.1396	1426896
1.1682	0.0775	30	1.1244	1704888
1.1565	0.0905	35	1.1242	1985240
1.0138	0.1034	40	1.1384	2269216
0.9845	0.1163	45	1.1461	2554344
0.91	0.1292	50	1.1554	2839272
0.9047	0.1422	55	1.1678	3127496
0.9137	0.1551	60	1.1697	3415328
0.8846	0.1680	65	1.1704	3692024
0.9215	0.1809	70	1.1719	3967168
0.8233	0.1939	75	1.1850	4244568
0.6717	0.2068	80	1.1881	4531936
0.7733	0.2197	85	1.1770	4817232
0.6835	0.2326	90	1.1663	5103112
0.7503	0.2456	95	1.1860	5388248
0.6998	0.2585	100	1.1702	5669656
0.615	0.2714	105	1.1739	5956384
0.5807	0.2843	110	1.1799	6233928
0.6475	0.2973	115	1.1703	6517360
0.649	0.3102	120	1.1702	6802600
0.6409	0.3231	125	1.1747	7086032
0.6033	0.3360	130	1.1629	7364952
0.4875	0.3489	135	1.1752	7650744
0.6259	0.3619	140	1.1664	7933080
0.5287	0.3748	145	1.1703	8220488
0.4745	0.3877	150	1.1645	8501544
0.4469	0.4006	155	1.1667	8781400
0.5011	0.4136	160	1.1652	9056664
0.4512	0.4265	165	1.1630	9337208
0.5347	0.4394	170	1.1630	9620568
0.5226	0.4523	175	1.1626	9896128
0.4775	0.4653	180	1.1568	10176840
0.5018	0.4782	185	1.1642	10461520
0.508	0.4911	190	1.1530	10741632
0.3972	0.5040	195	1.1550	11024096
0.4409	0.5170	200	1.1539	11301736
0.5384	0.5299	205	1.1477	11579816
0.4633	0.5428	210	1.1501	11865648
0.5198	0.5557	215	1.1410	12156088
0.3293	0.5687	220	1.1480	12434448
0.4762	0.5816	225	1.1375	12720344
0.5467	0.5945	230	1.1424	13003704
0.4776	0.6074	235	1.1361	13292824
0.4567	0.6204	240	1.1398	13574560
0.4565	0.6333	245	1.1371	13859632
0.4899	0.6462	250	1.1369	14136888
0.3492	0.6591	255	1.1327	14421200
0.4968	0.6721	260	1.1315	14707344
0.3487	0.6850	265	1.1329	14988680
0.4001	0.6979	270	1.1258	15267688
0.3161	0.7108	275	1.1308	15540888
0.4089	0.7237	280	1.1262	15816840
0.3835	0.7367	285	1.1289	16098568
0.4023	0.7496	290	1.1270	16387224
0.5333	0.7625	295	1.1243	16672848
0.492	0.7754	300	1.1276	16955104
0.3361	0.7884	305	1.1215	17232984
0.4585	0.8013	310	1.1210	17517512
0.3541	0.8142	315	1.1232	17805408
0.4862	0.8271	320	1.1195	18086744
0.5085	0.8401	325	1.1208	18374072
0.4206	0.8530	330	1.1198	18654568
0.3501	0.8659	335	1.1154	18936680
0.4675	0.8788	340	1.1207	19213288
0.3692	0.8918	345	1.1151	19495512
0.3526	0.9047	350	1.1162	19777904
0.5192	0.9176	355	1.1134	20053800
0.5117	0.9305	360	1.1101	20335472
0.3685	0.9435	365	1.1152	20620416
0.3554	0.9564	370	1.1103	20898680
0.4323	0.9693	375	1.1123	21181272
0.4111	0.9822	380	1.1120	21465480
0.3962	0.9952	385	1.1119	21742008

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter3_sftsd0

Evaluation results