collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1017
Num Input Tokens Seen: 30391200

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.6925	0.0091	5	1.3858	274360
1.4659	0.0183	10	1.3191	554728
1.4901	0.0274	15	1.2524	829128
1.2538	0.0365	20	1.1937	1108016
1.2819	0.0457	25	1.1684	1390936
1.0879	0.0548	30	1.1576	1671768
1.0761	0.0640	35	1.1720	1947856
0.9364	0.0731	40	1.1667	2228328
0.8285	0.0822	45	1.2083	2515248
0.7714	0.0914	50	1.2007	2796696
0.7316	0.1005	55	1.2211	3077936
0.5592	0.1096	60	1.2119	3353176
0.5585	0.1188	65	1.2018	3626024
0.4803	0.1279	70	1.2017	3898832
0.5021	0.1370	75	1.1963	4175912
0.4514	0.1462	80	1.2121	4455072
0.3612	0.1553	85	1.1931	4727720
0.4515	0.1645	90	1.1881	5009488
0.4461	0.1736	95	1.1880	5282608
0.5034	0.1827	100	1.1860	5553496
0.5685	0.1919	105	1.1842	5836064
0.4516	0.2010	110	1.1854	6114952
0.2958	0.2101	115	1.1750	6392272
0.3735	0.2193	120	1.1766	6663208
0.3907	0.2284	125	1.1676	6944456
0.4901	0.2376	130	1.1709	7221960
0.3111	0.2467	135	1.1608	7500464
0.3151	0.2558	140	1.1681	7786536
0.3311	0.2650	145	1.1629	8061032
0.3119	0.2741	150	1.1624	8339776
0.425	0.2832	155	1.1626	8614064
0.3599	0.2924	160	1.1609	8885704
0.3478	0.3015	165	1.1554	9166584
0.4074	0.3106	170	1.1529	9453272
0.24	0.3198	175	1.1585	9734480
0.3161	0.3289	180	1.1508	10011232
0.3567	0.3381	185	1.1568	10284712
0.3651	0.3472	190	1.1469	10565320
0.2963	0.3563	195	1.1513	10834768
0.3133	0.3655	200	1.1498	11114320
0.4982	0.3746	205	1.1447	11395816
0.3136	0.3837	210	1.1435	11676048
0.2945	0.3929	215	1.1452	11957056
0.2632	0.4020	220	1.1417	12225504
0.2754	0.4111	225	1.1421	12506816
0.2892	0.4203	230	1.1411	12778688
0.3303	0.4294	235	1.1351	13052448
0.3272	0.4386	240	1.1422	13325752
0.2219	0.4477	245	1.1361	13612800
0.3318	0.4568	250	1.1347	13888688
0.3058	0.4660	255	1.1358	14167640
0.3574	0.4751	260	1.1317	14443576
0.3944	0.4842	265	1.1296	14722000
0.3048	0.4934	270	1.1306	14994688
0.2954	0.5025	275	1.1313	15271576
0.3244	0.5116	280	1.1269	15548760
0.371	0.5208	285	1.1297	15821744
0.3526	0.5299	290	1.1274	16091768
0.2937	0.5391	295	1.1271	16364464
0.3097	0.5482	300	1.1230	16641960
0.3057	0.5573	305	1.1273	16918448
0.3099	0.5665	310	1.1251	17193440
0.283	0.5756	315	1.1235	17470240
0.3392	0.5847	320	1.1248	17749104
0.3276	0.5939	325	1.1205	18032184
0.2521	0.6030	330	1.1216	18317360
0.2278	0.6122	335	1.1183	18588736
0.2214	0.6213	340	1.1208	18864160
0.3554	0.6304	345	1.1189	19143568
0.2126	0.6396	350	1.1188	19430928
0.3241	0.6487	355	1.1182	19712432
0.2468	0.6578	360	1.1167	19992936
0.302	0.6670	365	1.1179	20275360
0.225	0.6761	370	1.1145	20554416
0.2699	0.6852	375	1.1150	20833584
0.2959	0.6944	380	1.1127	21116288
0.3684	0.7035	385	1.1135	21393272
0.2894	0.7127	390	1.1132	21664504
0.3468	0.7218	395	1.1104	21945840
0.3365	0.7309	400	1.1112	22224640
0.2756	0.7401	405	1.1138	22492512
0.2134	0.7492	410	1.1097	22774128
0.273	0.7583	415	1.1099	23054632
0.248	0.7675	420	1.1095	23332744
0.4175	0.7766	425	1.1101	23610928
0.2982	0.7857	430	1.1105	23886096
0.2497	0.7949	435	1.1085	24164752
0.2912	0.8040	440	1.1079	24441944
0.3517	0.8132	445	1.1078	24716256
0.3852	0.8223	450	1.1070	24992216
0.3735	0.8314	455	1.1088	25271800
0.3185	0.8406	460	1.1092	25558096
0.2549	0.8497	465	1.1083	25837144
0.1872	0.8588	470	1.1066	26120576
0.2247	0.8680	475	1.1073	26393552
0.2985	0.8771	480	1.1055	26672072
0.27	0.8862	485	1.1037	26957208
0.2618	0.8954	490	1.1059	27236264
0.2642	0.9045	495	1.1053	27515256
0.2234	0.9137	500	1.1039	27791360
0.3124	0.9228	505	1.1068	28070688
0.3348	0.9319	510	1.1028	28340240
0.3423	0.9411	515	1.1021	28613928
0.24	0.9502	520	1.1043	28889472
0.2406	0.9593	525	1.1058	29170016
0.2347	0.9685	530	1.1031	29451680
0.2342	0.9776	535	1.1043	29728536
0.3459	0.9868	540	1.1039	30007456
0.2486	0.9959	545	1.1014	30279832

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd2

Evaluation results