collapse_gemma-2-2b_hs2_accumulate_iter11_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1036
- Num Input Tokens Seen: 56860576
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6619 | 0.0048 | 5 | 1.3885 | 269208 |
1.5561 | 0.0095 | 10 | 1.3670 | 533104 |
1.5456 | 0.0143 | 15 | 1.3196 | 801872 |
1.417 | 0.0190 | 20 | 1.2681 | 1069744 |
1.309 | 0.0238 | 25 | 1.2331 | 1347056 |
1.2013 | 0.0286 | 30 | 1.1947 | 1624792 |
1.1202 | 0.0333 | 35 | 1.1898 | 1890872 |
1.0145 | 0.0381 | 40 | 1.2165 | 2160520 |
0.9685 | 0.0429 | 45 | 1.2058 | 2434752 |
0.8294 | 0.0476 | 50 | 1.2506 | 2702528 |
0.7017 | 0.0524 | 55 | 1.2530 | 2964728 |
0.6012 | 0.0571 | 60 | 1.2459 | 3236840 |
0.5699 | 0.0619 | 65 | 1.2474 | 3504304 |
0.5568 | 0.0667 | 70 | 1.2397 | 3774176 |
0.459 | 0.0714 | 75 | 1.2315 | 4045968 |
0.4137 | 0.0762 | 80 | 1.2152 | 4317176 |
0.3721 | 0.0810 | 85 | 1.2139 | 4593360 |
0.3437 | 0.0857 | 90 | 1.2082 | 4859344 |
0.2922 | 0.0905 | 95 | 1.2201 | 5128960 |
0.3979 | 0.0952 | 100 | 1.2055 | 5402832 |
0.3391 | 0.1000 | 105 | 1.1999 | 5668728 |
0.2973 | 0.1048 | 110 | 1.1868 | 5941816 |
0.2907 | 0.1095 | 115 | 1.2060 | 6211888 |
0.2166 | 0.1143 | 120 | 1.1917 | 6476504 |
0.3361 | 0.1191 | 125 | 1.2059 | 6748360 |
0.2939 | 0.1238 | 130 | 1.1898 | 7020304 |
0.2627 | 0.1286 | 135 | 1.2001 | 7298768 |
0.2605 | 0.1333 | 140 | 1.1872 | 7562912 |
0.2749 | 0.1381 | 145 | 1.1898 | 7831408 |
0.2645 | 0.1429 | 150 | 1.1817 | 8105208 |
0.2205 | 0.1476 | 155 | 1.1854 | 8374344 |
0.2723 | 0.1524 | 160 | 1.1811 | 8649800 |
0.27 | 0.1572 | 165 | 1.1824 | 8924904 |
0.2381 | 0.1619 | 170 | 1.1850 | 9194288 |
0.155 | 0.1667 | 175 | 1.1805 | 9460784 |
0.216 | 0.1714 | 180 | 1.1811 | 9734216 |
0.1763 | 0.1762 | 185 | 1.1816 | 10007184 |
0.1992 | 0.1810 | 190 | 1.1778 | 10282216 |
0.1878 | 0.1857 | 195 | 1.1736 | 10559320 |
0.2359 | 0.1905 | 200 | 1.1746 | 10832344 |
0.1692 | 0.1953 | 205 | 1.1731 | 11102976 |
0.2311 | 0.2000 | 210 | 1.1718 | 11377528 |
0.1974 | 0.2048 | 215 | 1.1683 | 11644848 |
0.2118 | 0.2095 | 220 | 1.1726 | 11917864 |
0.2069 | 0.2143 | 225 | 1.1676 | 12188040 |
0.2559 | 0.2191 | 230 | 1.1680 | 12460936 |
0.2865 | 0.2238 | 235 | 1.1701 | 12732608 |
0.2321 | 0.2286 | 240 | 1.1647 | 13005728 |
0.1391 | 0.2334 | 245 | 1.1630 | 13278536 |
0.2632 | 0.2381 | 250 | 1.1606 | 13549648 |
0.2139 | 0.2429 | 255 | 1.1660 | 13814824 |
0.1839 | 0.2476 | 260 | 1.1591 | 14088816 |
0.2781 | 0.2524 | 265 | 1.1601 | 14360152 |
0.2523 | 0.2572 | 270 | 1.1583 | 14634592 |
0.2295 | 0.2619 | 275 | 1.1594 | 14901720 |
0.1748 | 0.2667 | 280 | 1.1594 | 15169568 |
0.1853 | 0.2715 | 285 | 1.1590 | 15442312 |
0.2221 | 0.2762 | 290 | 1.1550 | 15712784 |
0.2346 | 0.2810 | 295 | 1.1557 | 15983504 |
0.1717 | 0.2857 | 300 | 1.1566 | 16252736 |
0.2572 | 0.2905 | 305 | 1.1528 | 16522656 |
0.1948 | 0.2953 | 310 | 1.1501 | 16804384 |
0.2072 | 0.3000 | 315 | 1.1551 | 17069304 |
0.2233 | 0.3048 | 320 | 1.1495 | 17347080 |
0.1787 | 0.3096 | 325 | 1.1473 | 17619848 |
0.1904 | 0.3143 | 330 | 1.1498 | 17894792 |
0.1648 | 0.3191 | 335 | 1.1474 | 18164360 |
0.1701 | 0.3238 | 340 | 1.1451 | 18438616 |
0.2417 | 0.3286 | 345 | 1.1465 | 18707840 |
0.2617 | 0.3334 | 350 | 1.1449 | 18975640 |
0.1717 | 0.3381 | 355 | 1.1437 | 19243800 |
0.2343 | 0.3429 | 360 | 1.1431 | 19518544 |
0.1921 | 0.3477 | 365 | 1.1381 | 19781600 |
0.1478 | 0.3524 | 370 | 1.1447 | 20050016 |
0.2128 | 0.3572 | 375 | 1.1449 | 20326736 |
0.2403 | 0.3619 | 380 | 1.1369 | 20593560 |
0.204 | 0.3667 | 385 | 1.1410 | 20865264 |
0.2372 | 0.3715 | 390 | 1.1429 | 21133128 |
0.2333 | 0.3762 | 395 | 1.1395 | 21402880 |
0.1617 | 0.3810 | 400 | 1.1404 | 21676584 |
0.1994 | 0.3858 | 405 | 1.1383 | 21956864 |
0.2082 | 0.3905 | 410 | 1.1367 | 22231848 |
0.1889 | 0.3953 | 415 | 1.1379 | 22504232 |
0.2024 | 0.4000 | 420 | 1.1365 | 22773424 |
0.1367 | 0.4048 | 425 | 1.1375 | 23040760 |
0.1531 | 0.4096 | 430 | 1.1358 | 23311384 |
0.2435 | 0.4143 | 435 | 1.1336 | 23579336 |
0.2576 | 0.4191 | 440 | 1.1348 | 23855568 |
0.1846 | 0.4239 | 445 | 1.1339 | 24131984 |
0.1664 | 0.4286 | 450 | 1.1335 | 24400920 |
0.1921 | 0.4334 | 455 | 1.1346 | 24671968 |
0.2055 | 0.4381 | 460 | 1.1328 | 24947688 |
0.2394 | 0.4429 | 465 | 1.1299 | 25216584 |
0.0912 | 0.4477 | 470 | 1.1309 | 25479304 |
0.1602 | 0.4524 | 475 | 1.1316 | 25751328 |
0.1711 | 0.4572 | 480 | 1.1297 | 26020464 |
0.1851 | 0.4620 | 485 | 1.1304 | 26297840 |
0.1544 | 0.4667 | 490 | 1.1306 | 26564208 |
0.2246 | 0.4715 | 495 | 1.1292 | 26837544 |
0.2593 | 0.4762 | 500 | 1.1293 | 27106200 |
0.1452 | 0.4810 | 505 | 1.1279 | 27379904 |
0.1888 | 0.4858 | 510 | 1.1285 | 27650704 |
0.1808 | 0.4905 | 515 | 1.1260 | 27917048 |
0.1349 | 0.4953 | 520 | 1.1271 | 28191080 |
0.1523 | 0.5001 | 525 | 1.1260 | 28456360 |
0.1804 | 0.5048 | 530 | 1.1271 | 28728472 |
0.1876 | 0.5096 | 535 | 1.1252 | 28996608 |
0.1901 | 0.5143 | 540 | 1.1257 | 29268304 |
0.1649 | 0.5191 | 545 | 1.1258 | 29534384 |
0.2207 | 0.5239 | 550 | 1.1258 | 29802704 |
0.1712 | 0.5286 | 555 | 1.1253 | 30074840 |
0.1941 | 0.5334 | 560 | 1.1235 | 30347512 |
0.1767 | 0.5382 | 565 | 1.1262 | 30613424 |
0.226 | 0.5429 | 570 | 1.1246 | 30886904 |
0.1604 | 0.5477 | 575 | 1.1226 | 31159704 |
0.1883 | 0.5524 | 580 | 1.1239 | 31438992 |
0.1438 | 0.5572 | 585 | 1.1238 | 31713696 |
0.1358 | 0.5620 | 590 | 1.1234 | 31989648 |
0.2459 | 0.5667 | 595 | 1.1219 | 32257152 |
0.1788 | 0.5715 | 600 | 1.1241 | 32528856 |
0.1915 | 0.5763 | 605 | 1.1232 | 32801536 |
0.1908 | 0.5810 | 610 | 1.1195 | 33067456 |
0.1838 | 0.5858 | 615 | 1.1215 | 33343248 |
0.1612 | 0.5905 | 620 | 1.1214 | 33614488 |
0.1305 | 0.5953 | 625 | 1.1185 | 33880584 |
0.1575 | 0.6001 | 630 | 1.1196 | 34151360 |
0.1482 | 0.6048 | 635 | 1.1222 | 34429648 |
0.1527 | 0.6096 | 640 | 1.1196 | 34713128 |
0.1519 | 0.6144 | 645 | 1.1201 | 34985712 |
0.1264 | 0.6191 | 650 | 1.1236 | 35249112 |
0.1938 | 0.6239 | 655 | 1.1188 | 35533992 |
0.1878 | 0.6286 | 660 | 1.1181 | 35799384 |
0.1363 | 0.6334 | 665 | 1.1197 | 36069736 |
0.2028 | 0.6382 | 670 | 1.1183 | 36342600 |
0.2482 | 0.6429 | 675 | 1.1157 | 36619536 |
0.1125 | 0.6477 | 680 | 1.1177 | 36896976 |
0.0909 | 0.6525 | 685 | 1.1208 | 37164168 |
0.2006 | 0.6572 | 690 | 1.1150 | 37434024 |
0.1549 | 0.6620 | 695 | 1.1159 | 37703384 |
0.2242 | 0.6667 | 700 | 1.1172 | 37976752 |
0.2624 | 0.6715 | 705 | 1.1150 | 38254056 |
0.2141 | 0.6763 | 710 | 1.1147 | 38520576 |
0.2093 | 0.6810 | 715 | 1.1186 | 38791592 |
0.199 | 0.6858 | 720 | 1.1183 | 39062104 |
0.16 | 0.6906 | 725 | 1.1158 | 39333144 |
0.1316 | 0.6953 | 730 | 1.1164 | 39601296 |
0.1405 | 0.7001 | 735 | 1.1165 | 39870328 |
0.164 | 0.7048 | 740 | 1.1156 | 40143536 |
0.2407 | 0.7096 | 745 | 1.1165 | 40413120 |
0.1927 | 0.7144 | 750 | 1.1157 | 40679328 |
0.1008 | 0.7191 | 755 | 1.1148 | 40954008 |
0.1801 | 0.7239 | 760 | 1.1155 | 41224728 |
0.1303 | 0.7287 | 765 | 1.1153 | 41500888 |
0.1614 | 0.7334 | 770 | 1.1137 | 41772264 |
0.1058 | 0.7382 | 775 | 1.1131 | 42037464 |
0.1393 | 0.7429 | 780 | 1.1144 | 42308296 |
0.1357 | 0.7477 | 785 | 1.1115 | 42577088 |
0.2385 | 0.7525 | 790 | 1.1114 | 42850208 |
0.1819 | 0.7572 | 795 | 1.1105 | 43131992 |
0.1754 | 0.7620 | 800 | 1.1143 | 43404648 |
0.1844 | 0.7668 | 805 | 1.1126 | 43675664 |
0.1651 | 0.7715 | 810 | 1.1107 | 43944080 |
0.1492 | 0.7763 | 815 | 1.1111 | 44212952 |
0.2447 | 0.7810 | 820 | 1.1129 | 44489064 |
0.2831 | 0.7858 | 825 | 1.1116 | 44757776 |
0.198 | 0.7906 | 830 | 1.1119 | 45025712 |
0.2413 | 0.7953 | 835 | 1.1144 | 45298744 |
0.2419 | 0.8001 | 840 | 1.1142 | 45569856 |
0.212 | 0.8049 | 845 | 1.1119 | 45843440 |
0.1282 | 0.8096 | 850 | 1.1096 | 46118392 |
0.2365 | 0.8144 | 855 | 1.1117 | 46387616 |
0.1231 | 0.8191 | 860 | 1.1110 | 46655568 |
0.1475 | 0.8239 | 865 | 1.1119 | 46926024 |
0.1728 | 0.8287 | 870 | 1.1104 | 47191592 |
0.1555 | 0.8334 | 875 | 1.1096 | 47468280 |
0.2101 | 0.8382 | 880 | 1.1083 | 47734104 |
0.1643 | 0.8430 | 885 | 1.1096 | 48010136 |
0.2671 | 0.8477 | 890 | 1.1119 | 48275232 |
0.2283 | 0.8525 | 895 | 1.1099 | 48542904 |
0.249 | 0.8572 | 900 | 1.1075 | 48814896 |
0.1618 | 0.8620 | 905 | 1.1086 | 49086616 |
0.1733 | 0.8668 | 910 | 1.1097 | 49358688 |
0.1571 | 0.8715 | 915 | 1.1093 | 49630792 |
0.207 | 0.8763 | 920 | 1.1101 | 49901280 |
0.2012 | 0.8811 | 925 | 1.1088 | 50172728 |
0.1682 | 0.8858 | 930 | 1.1079 | 50439696 |
0.1735 | 0.8906 | 935 | 1.1068 | 50710368 |
0.1766 | 0.8953 | 940 | 1.1095 | 50975280 |
0.1292 | 0.9001 | 945 | 1.1081 | 51245360 |
0.1688 | 0.9049 | 950 | 1.1079 | 51515104 |
0.1044 | 0.9096 | 955 | 1.1096 | 51781736 |
0.1414 | 0.9144 | 960 | 1.1114 | 52043768 |
0.1954 | 0.9192 | 965 | 1.1054 | 52312032 |
0.2114 | 0.9239 | 970 | 1.1042 | 52586368 |
0.2029 | 0.9287 | 975 | 1.1070 | 52858112 |
0.2393 | 0.9334 | 980 | 1.1046 | 53127184 |
0.1397 | 0.9382 | 985 | 1.1042 | 53393096 |
0.1867 | 0.9430 | 990 | 1.1053 | 53666584 |
0.1785 | 0.9477 | 995 | 1.1069 | 53930960 |
0.1624 | 0.9525 | 1000 | 1.1081 | 54208544 |
0.204 | 0.9573 | 1005 | 1.1064 | 54480672 |
0.2185 | 0.9620 | 1010 | 1.1057 | 54753248 |
0.1201 | 0.9668 | 1015 | 1.1045 | 55020320 |
0.2427 | 0.9715 | 1020 | 1.1038 | 55288376 |
0.1832 | 0.9763 | 1025 | 1.1038 | 55555816 |
0.1387 | 0.9811 | 1030 | 1.1040 | 55829144 |
0.1508 | 0.9858 | 1035 | 1.1034 | 56106152 |
0.2211 | 0.9906 | 1040 | 1.1030 | 56371320 |
0.2265 | 0.9954 | 1045 | 1.1044 | 56641808 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 10
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter11_sftsd2
Base model
google/gemma-2-2b