collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1029
Num Input Tokens Seen: 41091672

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.5665	0.0066	5	1.3873	272560
1.5456	0.0132	10	1.3529	547080
1.4344	0.0198	15	1.2836	822880
1.4512	0.0264	20	1.2345	1089864
1.3462	0.0330	25	1.1901	1361576
1.1712	0.0396	30	1.1835	1634504
1.0826	0.0462	35	1.1964	1895936
0.9291	0.0527	40	1.1914	2166120
0.8296	0.0593	45	1.2208	2435904
0.6654	0.0659	50	1.2499	2706240
0.6401	0.0725	55	1.2356	2984976
0.6449	0.0791	60	1.2089	3257728
0.5585	0.0857	65	1.2026	3526976
0.468	0.0923	70	1.2120	3804888
0.5271	0.0989	75	1.2040	4078544
0.3901	0.1055	80	1.1976	4356048
0.4389	0.1121	85	1.2049	4621624
0.3482	0.1187	90	1.1972	4888632
0.3224	0.1253	95	1.1926	5152168
0.4305	0.1319	100	1.1944	5423968
0.3758	0.1385	105	1.1825	5697240
0.3646	0.1450	110	1.1919	5971384
0.3215	0.1516	115	1.1776	6240360
0.3273	0.1582	120	1.1907	6509288
0.3152	0.1648	125	1.1786	6779048
0.2365	0.1714	130	1.1833	7048200
0.3342	0.1780	135	1.1750	7316656
0.3586	0.1846	140	1.1774	7590728
0.2927	0.1912	145	1.1737	7859680
0.3788	0.1978	150	1.1760	8126224
0.2964	0.2044	155	1.1741	8403808
0.2938	0.2110	160	1.1677	8672216
0.2518	0.2176	165	1.1735	8946264
0.3334	0.2242	170	1.1647	9208352
0.311	0.2308	175	1.1647	9477208
0.3065	0.2373	180	1.1620	9748024
0.2517	0.2439	185	1.1613	10021768
0.2672	0.2505	190	1.1569	10293208
0.2611	0.2571	195	1.1545	10569280
0.2265	0.2637	200	1.1548	10840984
0.3068	0.2703	205	1.1520	11116568
0.2929	0.2769	210	1.1568	11394928
0.3351	0.2835	215	1.1547	11666600
0.2687	0.2901	220	1.1544	11946656
0.2501	0.2967	225	1.1479	12224240
0.1991	0.3033	230	1.1520	12500672
0.2434	0.3099	235	1.1477	12767840
0.1667	0.3165	240	1.1453	13035688
0.2564	0.3231	245	1.1509	13312232
0.2856	0.3297	250	1.1436	13584328
0.305	0.3362	255	1.1425	13853288
0.2765	0.3428	260	1.1456	14113512
0.2209	0.3494	265	1.1455	14385280
0.2125	0.3560	270	1.1410	14660096
0.274	0.3626	275	1.1417	14931976
0.2181	0.3692	280	1.1411	15202008
0.2481	0.3758	285	1.1374	15468896
0.2629	0.3824	290	1.1372	15733744
0.2826	0.3890	295	1.1366	16004424
0.2646	0.3956	300	1.1363	16276088
0.2729	0.4022	305	1.1333	16547304
0.2735	0.4088	310	1.1350	16819224
0.2881	0.4154	315	1.1349	17088704
0.2208	0.4220	320	1.1304	17362560
0.1822	0.4285	325	1.1348	17632840
0.3197	0.4351	330	1.1306	17903232
0.1763	0.4417	335	1.1287	18171208
0.2851	0.4483	340	1.1333	18444312
0.2406	0.4549	345	1.1318	18716768
0.2571	0.4615	350	1.1291	18983016
0.3931	0.4681	355	1.1282	19256840
0.1952	0.4747	360	1.1287	19527776
0.227	0.4813	365	1.1282	19800232
0.2979	0.4879	370	1.1285	20074720
0.1515	0.4945	375	1.1280	20350824
0.336	0.5011	380	1.1254	20627392
0.2381	0.5077	385	1.1258	20900344
0.2331	0.5143	390	1.1253	21173120
0.2176	0.5209	395	1.1250	21442720
0.232	0.5274	400	1.1268	21711376
0.2648	0.5340	405	1.1246	21977752
0.2398	0.5406	410	1.1241	22247224
0.2246	0.5472	415	1.1245	22525976
0.2836	0.5538	420	1.1199	22795472
0.242	0.5604	425	1.1233	23063720
0.2369	0.5670	430	1.1230	23333144
0.2856	0.5736	435	1.1206	23599032
0.2595	0.5802	440	1.1208	23871616
0.2154	0.5868	445	1.1188	24144160
0.2541	0.5934	450	1.1208	24412552
0.2378	0.6000	455	1.1210	24683400
0.233	0.6066	460	1.1183	24956656
0.3136	0.6132	465	1.1211	25235888
0.2549	0.6197	470	1.1185	25505944
0.259	0.6263	475	1.1179	25776080
0.1539	0.6329	480	1.1197	26043984
0.2459	0.6395	485	1.1183	26318896
0.2342	0.6461	490	1.1182	26585616
0.2173	0.6527	495	1.1172	26862168
0.3048	0.6593	500	1.1172	27130760
0.2851	0.6659	505	1.1142	27397928
0.2091	0.6725	510	1.1148	27670712
0.3143	0.6791	515	1.1149	27933056
0.1672	0.6857	520	1.1152	28201952
0.3181	0.6923	525	1.1164	28477464
0.1914	0.6989	530	1.1174	28743664
0.2931	0.7055	535	1.1155	29016592
0.2285	0.7120	540	1.1133	29283872
0.2749	0.7186	545	1.1163	29554240
0.2901	0.7252	550	1.1145	29821128
0.2361	0.7318	555	1.1114	30095352
0.2654	0.7384	560	1.1125	30371160
0.1935	0.7450	565	1.1129	30645928
0.268	0.7516	570	1.1101	30919376
0.1795	0.7582	575	1.1139	31186848
0.2439	0.7648	580	1.1122	31459480
0.259	0.7714	585	1.1091	31733560
0.248	0.7780	590	1.1105	32003016
0.2186	0.7846	595	1.1106	32278448
0.1595	0.7912	600	1.1115	32538192
0.2058	0.7978	605	1.1117	32816064
0.2324	0.8044	610	1.1095	33087144
0.2045	0.8109	615	1.1094	33353000
0.2333	0.8175	620	1.1095	33621888
0.2159	0.8241	625	1.1076	33888104
0.2866	0.8307	630	1.1094	34159240
0.2268	0.8373	635	1.1101	34430064
0.1753	0.8439	640	1.1100	34700128
0.2076	0.8505	645	1.1089	34968768
0.1912	0.8571	650	1.1069	35250136
0.1534	0.8637	655	1.1074	35524024
0.1424	0.8703	660	1.1083	35789520
0.2325	0.8769	665	1.1076	36067376
0.2607	0.8835	670	1.1046	36340512
0.234	0.8901	675	1.1048	36603160
0.232	0.8967	680	1.1081	36872480
0.2998	0.9032	685	1.1080	37146736
0.1921	0.9098	690	1.1045	37414776
0.2492	0.9164	695	1.1060	37685600
0.27	0.9230	700	1.1068	37949648
0.2159	0.9296	705	1.1046	38226312
0.1912	0.9362	710	1.1062	38502072
0.23	0.9428	715	1.1076	38772744
0.3387	0.9494	720	1.1054	39041632
0.23	0.9560	725	1.1051	39313560
0.2785	0.9626	730	1.1065	39585992
0.2116	0.9692	735	1.1030	39856632
0.2378	0.9758	740	1.1040	40120176
0.2006	0.9824	745	1.1046	40392064
0.2418	0.9890	750	1.1024	40664776
0.2041	0.9955	755	1.1028	40931592

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2

collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2

Evaluation results