collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0940
Num Input Tokens Seen: 41008256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.622	0.0066	5	1.3875	269304
1.5034	0.0131	10	1.3529	535392
1.4652	0.0197	15	1.2845	801488
1.3824	0.0263	20	1.2330	1083632
1.3513	0.0329	25	1.1882	1355072
1.1303	0.0394	30	1.1821	1635432
0.9928	0.0460	35	1.1924	1894472
0.8215	0.0526	40	1.2128	2161232
0.8303	0.0592	45	1.2421	2428280
0.5895	0.0657	50	1.2467	2702640
0.5274	0.0723	55	1.2585	2973544
0.4315	0.0789	60	1.2433	3236296
0.4844	0.0855	65	1.2217	3506176
0.3115	0.0920	70	1.2198	3780160
0.3854	0.0986	75	1.2028	4051568
0.3065	0.1052	80	1.1925	4324928
0.3682	0.1118	85	1.1846	4593592
0.5041	0.1183	90	1.1806	4867408
0.2775	0.1249	95	1.1759	5128832
0.2909	0.1315	100	1.1737	5401472
0.3715	0.1381	105	1.1742	5673312
0.3444	0.1446	110	1.1667	5945400
0.3783	0.1512	115	1.1666	6217600
0.2508	0.1578	120	1.1635	6483312
0.2896	0.1644	125	1.1591	6757952
0.2647	0.1709	130	1.1586	7031456
0.1641	0.1775	135	1.1563	7296128
0.2283	0.1841	140	1.1550	7571176
0.2946	0.1906	145	1.1524	7847912
0.2922	0.1972	150	1.1484	8116960
0.2966	0.2038	155	1.1481	8393608
0.268	0.2104	160	1.1539	8663712
0.2847	0.2169	165	1.1498	8925096
0.2498	0.2235	170	1.1483	9194968
0.2431	0.2301	175	1.1496	9464256
0.2411	0.2367	180	1.1453	9727032
0.2876	0.2432	185	1.1429	9997984
0.3148	0.2498	190	1.1435	10271224
0.2655	0.2564	195	1.1408	10546488
0.2446	0.2630	200	1.1415	10805248
0.2493	0.2695	205	1.1428	11074256
0.2977	0.2761	210	1.1383	11346264
0.3008	0.2827	215	1.1380	11612816
0.212	0.2893	220	1.1349	11891040
0.2596	0.2958	225	1.1377	12163592
0.1793	0.3024	230	1.1370	12425752
0.248	0.3090	235	1.1325	12694640
0.2415	0.3156	240	1.1331	12963992
0.2047	0.3221	245	1.1319	13234768
0.1848	0.3287	250	1.1310	13511432
0.1624	0.3353	255	1.1309	13785032
0.2183	0.3419	260	1.1269	14052560
0.2079	0.3484	265	1.1321	14318664
0.1957	0.3550	270	1.1292	14591392
0.1832	0.3616	275	1.1273	14857944
0.2016	0.3681	280	1.1240	15133456
0.2329	0.3747	285	1.1258	15404048
0.2867	0.3813	290	1.1256	15674488
0.2546	0.3879	295	1.1245	15950072
0.2182	0.3944	300	1.1226	16211512
0.2931	0.4010	305	1.1222	16484192
0.2325	0.4076	310	1.1228	16754264
0.2637	0.4142	315	1.1211	17023608
0.1728	0.4207	320	1.1188	17305976
0.2263	0.4273	325	1.1195	17575456
0.2625	0.4339	330	1.1184	17840744
0.1631	0.4405	335	1.1177	18105176
0.1778	0.4470	340	1.1180	18369064
0.327	0.4536	345	1.1150	18635856
0.2488	0.4602	350	1.1160	18906504
0.2863	0.4668	355	1.1146	19171744
0.2554	0.4733	360	1.1152	19443216
0.2097	0.4799	365	1.1171	19710312
0.2428	0.4865	370	1.1147	19983280
0.1757	0.4931	375	1.1157	20253048
0.2844	0.4996	380	1.1143	20521536
0.2519	0.5062	385	1.1135	20793304
0.14	0.5128	390	1.1135	21056880
0.175	0.5194	395	1.1139	21322760
0.2719	0.5259	400	1.1138	21588632
0.2211	0.5325	405	1.1119	21863192
0.2711	0.5391	410	1.1115	22136640
0.2192	0.5456	415	1.1097	22400024
0.2555	0.5522	420	1.1088	22663600
0.2381	0.5588	425	1.1071	22931864
0.287	0.5654	430	1.1090	23211784
0.2197	0.5719	435	1.1079	23473528
0.1785	0.5785	440	1.1071	23741512
0.1782	0.5851	445	1.1088	24013864
0.1792	0.5917	450	1.1081	24283944
0.2492	0.5982	455	1.1053	24555032
0.2555	0.6048	460	1.1070	24818080
0.2014	0.6114	465	1.1091	25091208
0.1869	0.6180	470	1.1049	25354352
0.2532	0.6245	475	1.1049	25626256
0.2373	0.6311	480	1.1082	25900944
0.1992	0.6377	485	1.1064	26173568
0.2187	0.6443	490	1.1063	26447272
0.2218	0.6508	495	1.1089	26715952
0.2322	0.6574	500	1.1061	26983200
0.2482	0.6640	505	1.1060	27247440
0.1582	0.6706	510	1.1054	27515256
0.2757	0.6771	515	1.1051	27778344
0.1809	0.6837	520	1.1047	28049984
0.2369	0.6903	525	1.1042	28324744
0.2848	0.6969	530	1.1050	28589688
0.2827	0.7034	535	1.1021	28861280
0.2411	0.7100	540	1.1027	29129832
0.2118	0.7166	545	1.1020	29399128
0.1694	0.7231	550	1.1019	29669072
0.234	0.7297	555	1.1027	29932936
0.2118	0.7363	560	1.1031	30200984
0.2381	0.7429	565	1.1006	30467952
0.2596	0.7494	570	1.1016	30740152
0.2517	0.7560	575	1.1025	31013280
0.2295	0.7626	580	1.1009	31283736
0.2093	0.7692	585	1.1000	31546048
0.2714	0.7757	590	1.1016	31810008
0.1723	0.7823	595	1.0997	32082696
0.2339	0.7889	600	1.0983	32349272
0.2226	0.7955	605	1.0987	32617856
0.24	0.8020	610	1.0993	32890144
0.2459	0.8086	615	1.0978	33155616
0.2352	0.8152	620	1.0977	33421616
0.1846	0.8218	625	1.1003	33689760
0.1827	0.8283	630	1.0984	33954944
0.2186	0.8349	635	1.0991	34220096
0.1833	0.8415	640	1.1003	34487888
0.2651	0.8481	645	1.0984	34759656
0.2547	0.8546	650	1.0970	35032040
0.1985	0.8612	655	1.0965	35302816
0.2972	0.8678	660	1.0979	35576712
0.2817	0.8744	665	1.0956	35850400
0.2383	0.8809	670	1.0975	36121904
0.1814	0.8875	675	1.0993	36393368
0.2137	0.8941	680	1.0943	36664864
0.1752	0.9006	685	1.0941	36939200
0.2005	0.9072	690	1.0983	37205904
0.3429	0.9138	695	1.0970	37482984
0.2312	0.9204	700	1.0943	37755048
0.1952	0.9269	705	1.0958	38019952
0.2054	0.9335	710	1.0963	38291888
0.2247	0.9401	715	1.0958	38561640
0.1912	0.9467	720	1.0958	38835512
0.2334	0.9532	725	1.0964	39110024
0.1795	0.9598	730	1.0948	39382208
0.1963	0.9664	735	1.0946	39654856
0.2492	0.9730	740	1.0952	39930376
0.2831	0.9795	745	1.0927	40202200
0.2232	0.9861	750	1.0936	40469640
0.1724	0.9927	755	1.0955	40736256
0.2259	0.9993	760	1.0940	41008256

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1

collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1

Evaluation results