collapse_gemma-2-2b_hs2_accumulate_iter11_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0972
- Num Input Tokens Seen: 57637736
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6702 | 0.0048 | 5 | 1.3890 | 274216 |
1.7923 | 0.0095 | 10 | 1.3680 | 541184 |
1.4986 | 0.0143 | 15 | 1.3206 | 819976 |
1.5142 | 0.0191 | 20 | 1.2688 | 1095056 |
1.3889 | 0.0238 | 25 | 1.2318 | 1370656 |
1.3587 | 0.0286 | 30 | 1.1985 | 1645656 |
1.1457 | 0.0334 | 35 | 1.1961 | 1925936 |
0.9981 | 0.0382 | 40 | 1.2120 | 2197920 |
0.8253 | 0.0429 | 45 | 1.2501 | 2472376 |
0.7178 | 0.0477 | 50 | 1.2900 | 2748904 |
0.5007 | 0.0525 | 55 | 1.3346 | 3016944 |
0.444 | 0.0572 | 60 | 1.2951 | 3291264 |
0.4447 | 0.0620 | 65 | 1.2466 | 3563840 |
0.3044 | 0.0668 | 70 | 1.2693 | 3830896 |
0.2642 | 0.0715 | 75 | 1.2227 | 4102680 |
0.3298 | 0.0763 | 80 | 1.2361 | 4379024 |
0.2985 | 0.0811 | 85 | 1.2145 | 4655328 |
0.2451 | 0.0859 | 90 | 1.2161 | 4937112 |
0.2532 | 0.0906 | 95 | 1.2096 | 5218128 |
0.375 | 0.0954 | 100 | 1.1938 | 5493576 |
0.289 | 0.1002 | 105 | 1.1947 | 5770328 |
0.1574 | 0.1049 | 110 | 1.1874 | 6049328 |
0.3759 | 0.1097 | 115 | 1.1851 | 6325552 |
0.2427 | 0.1145 | 120 | 1.1837 | 6610192 |
0.2257 | 0.1192 | 125 | 1.1819 | 6892072 |
0.1955 | 0.1240 | 130 | 1.1734 | 7171040 |
0.2644 | 0.1288 | 135 | 1.1726 | 7439616 |
0.1902 | 0.1336 | 140 | 1.1729 | 7712336 |
0.2276 | 0.1383 | 145 | 1.1752 | 7993304 |
0.2112 | 0.1431 | 150 | 1.1703 | 8267944 |
0.2477 | 0.1479 | 155 | 1.1689 | 8538664 |
0.1573 | 0.1526 | 160 | 1.1661 | 8812544 |
0.1529 | 0.1574 | 165 | 1.1633 | 9084840 |
0.2697 | 0.1622 | 170 | 1.1603 | 9354376 |
0.155 | 0.1669 | 175 | 1.1664 | 9627880 |
0.2587 | 0.1717 | 180 | 1.1585 | 9900592 |
0.1548 | 0.1765 | 185 | 1.1552 | 10177088 |
0.2937 | 0.1813 | 190 | 1.1693 | 10452280 |
0.2339 | 0.1860 | 195 | 1.1561 | 10724920 |
0.2026 | 0.1908 | 200 | 1.1545 | 11001192 |
0.2147 | 0.1956 | 205 | 1.1566 | 11274040 |
0.1414 | 0.2003 | 210 | 1.1515 | 11546672 |
0.2323 | 0.2051 | 215 | 1.1534 | 11817992 |
0.1533 | 0.2099 | 220 | 1.1526 | 12089480 |
0.14 | 0.2146 | 225 | 1.1463 | 12365984 |
0.1898 | 0.2194 | 230 | 1.1484 | 12639416 |
0.2115 | 0.2242 | 235 | 1.1471 | 12920944 |
0.2441 | 0.2290 | 240 | 1.1456 | 13194920 |
0.18 | 0.2337 | 245 | 1.1457 | 13472976 |
0.1868 | 0.2385 | 250 | 1.1437 | 13750128 |
0.1601 | 0.2433 | 255 | 1.1445 | 14020560 |
0.1777 | 0.2480 | 260 | 1.1449 | 14292552 |
0.2047 | 0.2528 | 265 | 1.1402 | 14564888 |
0.1425 | 0.2576 | 270 | 1.1429 | 14841608 |
0.2403 | 0.2623 | 275 | 1.1423 | 15113080 |
0.1517 | 0.2671 | 280 | 1.1403 | 15387104 |
0.1189 | 0.2719 | 285 | 1.1384 | 15661744 |
0.1925 | 0.2767 | 290 | 1.1422 | 15929144 |
0.2971 | 0.2814 | 295 | 1.1411 | 16211016 |
0.1877 | 0.2862 | 300 | 1.1355 | 16479672 |
0.1461 | 0.2910 | 305 | 1.1430 | 16749592 |
0.1857 | 0.2957 | 310 | 1.1376 | 17026336 |
0.2592 | 0.3005 | 315 | 1.1343 | 17298512 |
0.1672 | 0.3053 | 320 | 1.1342 | 17575064 |
0.2067 | 0.3100 | 325 | 1.1332 | 17854904 |
0.2002 | 0.3148 | 330 | 1.1331 | 18130160 |
0.1643 | 0.3196 | 335 | 1.1326 | 18399608 |
0.1651 | 0.3244 | 340 | 1.1321 | 18678344 |
0.171 | 0.3291 | 345 | 1.1312 | 18951800 |
0.1323 | 0.3339 | 350 | 1.1295 | 19225944 |
0.2588 | 0.3387 | 355 | 1.1320 | 19499856 |
0.2399 | 0.3434 | 360 | 1.1316 | 19771272 |
0.1582 | 0.3482 | 365 | 1.1295 | 20044040 |
0.2115 | 0.3530 | 370 | 1.1286 | 20315888 |
0.2802 | 0.3577 | 375 | 1.1303 | 20584672 |
0.1257 | 0.3625 | 380 | 1.1302 | 20862920 |
0.2851 | 0.3673 | 385 | 1.1301 | 21143672 |
0.1692 | 0.3720 | 390 | 1.1275 | 21420672 |
0.1253 | 0.3768 | 395 | 1.1282 | 21690272 |
0.1186 | 0.3816 | 400 | 1.1294 | 21964296 |
0.1799 | 0.3864 | 405 | 1.1287 | 22236320 |
0.1645 | 0.3911 | 410 | 1.1270 | 22508600 |
0.1555 | 0.3959 | 415 | 1.1259 | 22789664 |
0.2175 | 0.4007 | 420 | 1.1241 | 23057640 |
0.138 | 0.4054 | 425 | 1.1253 | 23326680 |
0.103 | 0.4102 | 430 | 1.1266 | 23597896 |
0.2432 | 0.4150 | 435 | 1.1239 | 23872536 |
0.2285 | 0.4197 | 440 | 1.1235 | 24145512 |
0.2308 | 0.4245 | 445 | 1.1249 | 24421016 |
0.1546 | 0.4293 | 450 | 1.1212 | 24700832 |
0.1554 | 0.4341 | 455 | 1.1214 | 24981344 |
0.1487 | 0.4388 | 460 | 1.1230 | 25259784 |
0.2857 | 0.4436 | 465 | 1.1205 | 25541392 |
0.1433 | 0.4484 | 470 | 1.1215 | 25811664 |
0.2118 | 0.4531 | 475 | 1.1252 | 26083832 |
0.1689 | 0.4579 | 480 | 1.1200 | 26355200 |
0.2952 | 0.4627 | 485 | 1.1198 | 26624656 |
0.1463 | 0.4674 | 490 | 1.1204 | 26893408 |
0.1937 | 0.4722 | 495 | 1.1193 | 27169048 |
0.2256 | 0.4770 | 500 | 1.1187 | 27444320 |
0.261 | 0.4818 | 505 | 1.1203 | 27722760 |
0.2065 | 0.4865 | 510 | 1.1181 | 28007144 |
0.2435 | 0.4913 | 515 | 1.1180 | 28284584 |
0.2162 | 0.4961 | 520 | 1.1182 | 28558768 |
0.0977 | 0.5008 | 525 | 1.1156 | 28832184 |
0.1592 | 0.5056 | 530 | 1.1172 | 29106024 |
0.2281 | 0.5104 | 535 | 1.1191 | 29380552 |
0.2133 | 0.5151 | 540 | 1.1155 | 29653184 |
0.1949 | 0.5199 | 545 | 1.1149 | 29931800 |
0.2195 | 0.5247 | 550 | 1.1162 | 30206216 |
0.1479 | 0.5295 | 555 | 1.1172 | 30479464 |
0.2328 | 0.5342 | 560 | 1.1165 | 30744912 |
0.1147 | 0.5390 | 565 | 1.1162 | 31016856 |
0.1808 | 0.5438 | 570 | 1.1172 | 31290632 |
0.1348 | 0.5485 | 575 | 1.1138 | 31565776 |
0.191 | 0.5533 | 580 | 1.1129 | 31840416 |
0.1971 | 0.5581 | 585 | 1.1141 | 32115112 |
0.2878 | 0.5628 | 590 | 1.1132 | 32382568 |
0.1397 | 0.5676 | 595 | 1.1145 | 32652336 |
0.1622 | 0.5724 | 600 | 1.1147 | 32926376 |
0.1645 | 0.5772 | 605 | 1.1126 | 33202408 |
0.1748 | 0.5819 | 610 | 1.1129 | 33479448 |
0.1364 | 0.5867 | 615 | 1.1191 | 33755856 |
0.1537 | 0.5915 | 620 | 1.1132 | 34023888 |
0.2257 | 0.5962 | 625 | 1.1120 | 34300240 |
0.193 | 0.6010 | 630 | 1.1129 | 34575272 |
0.1591 | 0.6058 | 635 | 1.1140 | 34846568 |
0.2357 | 0.6105 | 640 | 1.1127 | 35111368 |
0.2122 | 0.6153 | 645 | 1.1104 | 35381512 |
0.2296 | 0.6201 | 650 | 1.1126 | 35652760 |
0.2162 | 0.6249 | 655 | 1.1105 | 35934616 |
0.1512 | 0.6296 | 660 | 1.1085 | 36204072 |
0.1565 | 0.6344 | 665 | 1.1101 | 36480472 |
0.1804 | 0.6392 | 670 | 1.1120 | 36750216 |
0.2327 | 0.6439 | 675 | 1.1099 | 37026448 |
0.137 | 0.6487 | 680 | 1.1101 | 37304624 |
0.1508 | 0.6535 | 685 | 1.1115 | 37584368 |
0.2355 | 0.6582 | 690 | 1.1094 | 37859192 |
0.1842 | 0.6630 | 695 | 1.1079 | 38140704 |
0.2053 | 0.6678 | 700 | 1.1100 | 38422936 |
0.1421 | 0.6725 | 705 | 1.1092 | 38700416 |
0.1501 | 0.6773 | 710 | 1.1084 | 38981096 |
0.2319 | 0.6821 | 715 | 1.1094 | 39257496 |
0.1176 | 0.6869 | 720 | 1.1101 | 39533328 |
0.1939 | 0.6916 | 725 | 1.1071 | 39810760 |
0.1603 | 0.6964 | 730 | 1.1056 | 40078576 |
0.1874 | 0.7012 | 735 | 1.1070 | 40354912 |
0.139 | 0.7059 | 740 | 1.1098 | 40624408 |
0.2035 | 0.7107 | 745 | 1.1069 | 40901880 |
0.2135 | 0.7155 | 750 | 1.1057 | 41168008 |
0.1524 | 0.7202 | 755 | 1.1047 | 41445360 |
0.2232 | 0.7250 | 760 | 1.1040 | 41726120 |
0.1531 | 0.7298 | 765 | 1.1032 | 42005192 |
0.161 | 0.7346 | 770 | 1.1049 | 42278432 |
0.2795 | 0.7393 | 775 | 1.1041 | 42546952 |
0.179 | 0.7441 | 780 | 1.1050 | 42823136 |
0.1964 | 0.7489 | 785 | 1.1039 | 43099096 |
0.1943 | 0.7536 | 790 | 1.1040 | 43374520 |
0.1432 | 0.7584 | 795 | 1.1045 | 43648920 |
0.1129 | 0.7632 | 800 | 1.1038 | 43922904 |
0.2349 | 0.7679 | 805 | 1.1034 | 44202160 |
0.1664 | 0.7727 | 810 | 1.1029 | 44481880 |
0.212 | 0.7775 | 815 | 1.1047 | 44763008 |
0.1529 | 0.7823 | 820 | 1.1044 | 45034848 |
0.1846 | 0.7870 | 825 | 1.1022 | 45313728 |
0.1814 | 0.7918 | 830 | 1.1009 | 45596136 |
0.2488 | 0.7966 | 835 | 1.1031 | 45876792 |
0.1445 | 0.8013 | 840 | 1.1036 | 46149456 |
0.1875 | 0.8061 | 845 | 1.1017 | 46428088 |
0.2112 | 0.8109 | 850 | 1.1002 | 46699200 |
0.16 | 0.8156 | 855 | 1.1011 | 46978064 |
0.1861 | 0.8204 | 860 | 1.1026 | 47260120 |
0.1451 | 0.8252 | 865 | 1.1020 | 47535184 |
0.2099 | 0.8300 | 870 | 1.1009 | 47805592 |
0.1349 | 0.8347 | 875 | 1.1017 | 48083288 |
0.1281 | 0.8395 | 880 | 1.1039 | 48359856 |
0.2059 | 0.8443 | 885 | 1.1011 | 48635480 |
0.1273 | 0.8490 | 890 | 1.1012 | 48907152 |
0.1119 | 0.8538 | 895 | 1.1017 | 49190432 |
0.1578 | 0.8586 | 900 | 1.1003 | 49466416 |
0.1527 | 0.8633 | 905 | 1.1007 | 49742840 |
0.1948 | 0.8681 | 910 | 1.1022 | 50006928 |
0.2125 | 0.8729 | 915 | 1.1012 | 50280032 |
0.1793 | 0.8777 | 920 | 1.1007 | 50554592 |
0.1295 | 0.8824 | 925 | 1.1002 | 50832056 |
0.1383 | 0.8872 | 930 | 1.1009 | 51107552 |
0.2569 | 0.8920 | 935 | 1.0993 | 51381432 |
0.1212 | 0.8967 | 940 | 1.1002 | 51665232 |
0.2039 | 0.9015 | 945 | 1.1034 | 51944288 |
0.2036 | 0.9063 | 950 | 1.1003 | 52220224 |
0.1233 | 0.9110 | 955 | 1.0989 | 52500992 |
0.1297 | 0.9158 | 960 | 1.0979 | 52774096 |
0.1562 | 0.9206 | 965 | 1.0985 | 53048096 |
0.1706 | 0.9254 | 970 | 1.1013 | 53323848 |
0.2443 | 0.9301 | 975 | 1.1002 | 53605952 |
0.1889 | 0.9349 | 980 | 1.0978 | 53880704 |
0.1236 | 0.9397 | 985 | 1.0987 | 54156976 |
0.1949 | 0.9444 | 990 | 1.0998 | 54432432 |
0.2046 | 0.9492 | 995 | 1.0986 | 54713288 |
0.2039 | 0.9540 | 1000 | 1.0977 | 54987288 |
0.2096 | 0.9587 | 1005 | 1.0978 | 55272664 |
0.1781 | 0.9635 | 1010 | 1.0987 | 55548680 |
0.2138 | 0.9683 | 1015 | 1.0978 | 55831208 |
0.1476 | 0.9731 | 1020 | 1.0972 | 56108072 |
0.1788 | 0.9778 | 1025 | 1.0975 | 56385328 |
0.1646 | 0.9826 | 1030 | 1.0974 | 56662832 |
0.1682 | 0.9874 | 1035 | 1.0977 | 56937776 |
0.1583 | 0.9921 | 1040 | 1.0964 | 57203296 |
0.1243 | 0.9969 | 1045 | 1.0963 | 57468416 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 10
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter11_sftsd0
Base model
google/gemma-2-2b