collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0913
Num Input Tokens Seen: 30890616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.4951	0.0088	5	1.3821	271232
1.4563	0.0176	10	1.3208	543664
1.3667	0.0264	15	1.2559	821864
1.2468	0.0352	20	1.1999	1091344
1.1733	0.0441	25	1.1730	1367232
1.0267	0.0529	30	1.1804	1638224
0.864	0.0617	35	1.1806	1913808
0.7861	0.0705	40	1.1954	2189912
0.7807	0.0793	45	1.2160	2452272
0.5332	0.0881	50	1.2235	2725128
0.573	0.0969	55	1.2244	2999016
0.5712	0.1057	60	1.2128	3271968
0.4288	0.1146	65	1.1903	3549280
0.4498	0.1234	70	1.1855	3823400
0.348	0.1322	75	1.1776	4090400
0.4052	0.1410	80	1.1760	4365584
0.3448	0.1498	85	1.1634	4637496
0.3418	0.1586	90	1.1639	4900760
0.3926	0.1674	95	1.1575	5171952
0.4322	0.1762	100	1.1566	5443000
0.3339	0.1850	105	1.1545	5712944
0.4672	0.1939	110	1.1510	5985072
0.315	0.2027	115	1.1509	6252048
0.3656	0.2115	120	1.1440	6523272
0.4343	0.2203	125	1.1468	6796536
0.3248	0.2291	130	1.1404	7067320
0.3063	0.2379	135	1.1457	7335176
0.3174	0.2467	140	1.1412	7607696
0.2611	0.2555	145	1.1442	7880176
0.3732	0.2643	150	1.1361	8151896
0.275	0.2732	155	1.1407	8428120
0.2902	0.2820	160	1.1367	8702104
0.2883	0.2908	165	1.1359	8970264
0.2804	0.2996	170	1.1360	9242488
0.2668	0.3084	175	1.1313	9514312
0.3018	0.3172	180	1.1331	9792568
0.2895	0.3260	185	1.1287	10067840
0.319	0.3348	190	1.1288	10336576
0.2636	0.3437	195	1.1277	10614920
0.2802	0.3525	200	1.1280	10884976
0.3354	0.3613	205	1.1252	11161384
0.348	0.3701	210	1.1268	11432472
0.2536	0.3789	215	1.1230	11709552
0.2744	0.3877	220	1.1237	11979744
0.274	0.3965	225	1.1238	12250848
0.3241	0.4053	230	1.1207	12526408
0.3095	0.4141	235	1.1204	12793864
0.2996	0.4230	240	1.1202	13056144
0.2803	0.4318	245	1.1202	13331664
0.3346	0.4406	250	1.1167	13607696
0.2643	0.4494	255	1.1170	13877856
0.3123	0.4582	260	1.1186	14147416
0.3048	0.4670	265	1.1167	14418600
0.408	0.4758	270	1.1154	14693312
0.3059	0.4846	275	1.1167	14958704
0.2863	0.4934	280	1.1133	15234336
0.2354	0.5023	285	1.1144	15507664
0.2094	0.5111	290	1.1138	15779648
0.3262	0.5199	295	1.1116	16048520
0.2988	0.5287	300	1.1128	16315984
0.1602	0.5375	305	1.1114	16586704
0.2703	0.5463	310	1.1109	16856960
0.2671	0.5551	315	1.1105	17130984
0.2595	0.5639	320	1.1100	17405032
0.2584	0.5728	325	1.1103	17672464
0.2967	0.5816	330	1.1074	17940736
0.2693	0.5904	335	1.1111	18209096
0.2368	0.5992	340	1.1083	18489328
0.3227	0.6080	345	1.1095	18763392
0.2433	0.6168	350	1.1079	19033928
0.2663	0.6256	355	1.1064	19306496
0.2232	0.6344	360	1.1078	19582464
0.215	0.6432	365	1.1057	19855128
0.285	0.6521	370	1.1041	20118936
0.2812	0.6609	375	1.1047	20386944
0.2726	0.6697	380	1.1061	20661136
0.2298	0.6785	385	1.1036	20934448
0.2719	0.6873	390	1.1043	21212424
0.2636	0.6961	395	1.1053	21483592
0.2778	0.7049	400	1.1019	21759880
0.2443	0.7137	405	1.1011	22031808
0.3002	0.7225	410	1.1028	22308840
0.2201	0.7314	415	1.1026	22581432
0.3103	0.7402	420	1.1011	22852504
0.2672	0.7490	425	1.0994	23120392
0.3186	0.7578	430	1.1016	23393176
0.2821	0.7666	435	1.1007	23666664
0.3132	0.7754	440	1.0987	23941552
0.2671	0.7842	445	1.0978	24216152
0.1736	0.7930	450	1.0975	24490968
0.3105	0.8019	455	1.0980	24765600
0.3713	0.8107	460	1.0961	25042848
0.3498	0.8195	465	1.0968	25319312
0.2632	0.8283	470	1.0983	25596904
0.308	0.8371	475	1.0951	25873656
0.2886	0.8459	480	1.0952	26149160
0.2547	0.8547	485	1.0952	26423016
0.2806	0.8635	490	1.0948	26701520
0.2446	0.8723	495	1.0947	26970808
0.2854	0.8812	500	1.0940	27243712
0.2576	0.8900	505	1.0945	27513104
0.2532	0.8988	510	1.0961	27784952
0.3655	0.9076	515	1.0942	28053616
0.2836	0.9164	520	1.0941	28325080
0.2758	0.9252	525	1.0963	28595744
0.2029	0.9340	530	1.0943	28870736
0.2777	0.9428	535	1.0943	29146344
0.2305	0.9516	540	1.0959	29417184
0.3159	0.9605	545	1.0959	29684608
0.3386	0.9693	550	1.0919	29958936
0.1623	0.9781	555	1.0933	30227592
0.3154	0.9869	560	1.0950	30506240
0.2721	0.9957	565	1.0915	30779280

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd1

collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd1

Evaluation results