collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0935
- Num Input Tokens Seen: 47161024
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6978 | 0.0059 | 5 | 1.3875 | 274600 |
1.6968 | 0.0117 | 10 | 1.3604 | 551880 |
1.5632 | 0.0176 | 15 | 1.2973 | 830064 |
1.4379 | 0.0235 | 20 | 1.2501 | 1106272 |
1.2817 | 0.0293 | 25 | 1.2088 | 1385768 |
1.1517 | 0.0352 | 30 | 1.1946 | 1659216 |
1.0942 | 0.0410 | 35 | 1.2074 | 1932440 |
0.9254 | 0.0469 | 40 | 1.2117 | 2196992 |
0.721 | 0.0528 | 45 | 1.2553 | 2471896 |
0.6813 | 0.0586 | 50 | 1.2657 | 2750888 |
0.5062 | 0.0645 | 55 | 1.2519 | 3031200 |
0.4411 | 0.0704 | 60 | 1.2509 | 3310768 |
0.5075 | 0.0762 | 65 | 1.2138 | 3586744 |
0.424 | 0.0821 | 70 | 1.2202 | 3864368 |
0.3142 | 0.0880 | 75 | 1.2079 | 4143224 |
0.2252 | 0.0938 | 80 | 1.2041 | 4417616 |
0.3666 | 0.0997 | 85 | 1.1931 | 4698824 |
0.2676 | 0.1056 | 90 | 1.1907 | 4974864 |
0.2986 | 0.1114 | 95 | 1.1829 | 5254936 |
0.3284 | 0.1173 | 100 | 1.1789 | 5531808 |
0.2768 | 0.1231 | 105 | 1.1780 | 5813768 |
0.2614 | 0.1290 | 110 | 1.1698 | 6088464 |
0.3301 | 0.1349 | 115 | 1.1707 | 6361032 |
0.2698 | 0.1407 | 120 | 1.1681 | 6633216 |
0.2382 | 0.1466 | 125 | 1.1715 | 6898784 |
0.2809 | 0.1525 | 130 | 1.1615 | 7174520 |
0.1634 | 0.1583 | 135 | 1.1643 | 7452568 |
0.2471 | 0.1642 | 140 | 1.1642 | 7724608 |
0.1667 | 0.1701 | 145 | 1.1564 | 8002504 |
0.2317 | 0.1759 | 150 | 1.1592 | 8276920 |
0.1864 | 0.1818 | 155 | 1.1562 | 8546632 |
0.328 | 0.1877 | 160 | 1.1617 | 8820408 |
0.2321 | 0.1935 | 165 | 1.1533 | 9098952 |
0.328 | 0.1994 | 170 | 1.1538 | 9377960 |
0.2287 | 0.2052 | 175 | 1.1532 | 9656936 |
0.1867 | 0.2111 | 180 | 1.1476 | 9927760 |
0.2382 | 0.2170 | 185 | 1.1491 | 10196448 |
0.2029 | 0.2228 | 190 | 1.1486 | 10481112 |
0.2068 | 0.2287 | 195 | 1.1454 | 10752608 |
0.224 | 0.2346 | 200 | 1.1457 | 11028048 |
0.2916 | 0.2404 | 205 | 1.1472 | 11300400 |
0.2391 | 0.2463 | 210 | 1.1427 | 11583504 |
0.2156 | 0.2522 | 215 | 1.1454 | 11862704 |
0.3089 | 0.2580 | 220 | 1.1440 | 12140720 |
0.1917 | 0.2639 | 225 | 1.1418 | 12416376 |
0.2594 | 0.2698 | 230 | 1.1421 | 12684952 |
0.2177 | 0.2756 | 235 | 1.1389 | 12964520 |
0.2374 | 0.2815 | 240 | 1.1370 | 13242784 |
0.171 | 0.2873 | 245 | 1.1378 | 13518784 |
0.2806 | 0.2932 | 250 | 1.1359 | 13792912 |
0.2614 | 0.2991 | 255 | 1.1365 | 14074704 |
0.2482 | 0.3049 | 260 | 1.1348 | 14352848 |
0.1728 | 0.3108 | 265 | 1.1351 | 14630696 |
0.2119 | 0.3167 | 270 | 1.1329 | 14900232 |
0.162 | 0.3225 | 275 | 1.1337 | 15172232 |
0.2106 | 0.3284 | 280 | 1.1337 | 15451416 |
0.2879 | 0.3343 | 285 | 1.1293 | 15728472 |
0.2585 | 0.3401 | 290 | 1.1335 | 16007616 |
0.1748 | 0.3460 | 295 | 1.1290 | 16289248 |
0.2434 | 0.3519 | 300 | 1.1281 | 16565984 |
0.2244 | 0.3577 | 305 | 1.1273 | 16836104 |
0.1754 | 0.3636 | 310 | 1.1267 | 17113904 |
0.2801 | 0.3694 | 315 | 1.1259 | 17388800 |
0.2446 | 0.3753 | 320 | 1.1247 | 17671920 |
0.2209 | 0.3812 | 325 | 1.1254 | 17947288 |
0.221 | 0.3870 | 330 | 1.1272 | 18226160 |
0.2041 | 0.3929 | 335 | 1.1237 | 18508432 |
0.2326 | 0.3988 | 340 | 1.1229 | 18786696 |
0.2101 | 0.4046 | 345 | 1.1238 | 19058776 |
0.2556 | 0.4105 | 350 | 1.1242 | 19337296 |
0.2378 | 0.4164 | 355 | 1.1204 | 19612288 |
0.2502 | 0.4222 | 360 | 1.1228 | 19889568 |
0.2006 | 0.4281 | 365 | 1.1225 | 20165512 |
0.1631 | 0.4340 | 370 | 1.1203 | 20441856 |
0.1769 | 0.4398 | 375 | 1.1196 | 20721896 |
0.2304 | 0.4457 | 380 | 1.1196 | 20995640 |
0.1723 | 0.4515 | 385 | 1.1208 | 21275640 |
0.2462 | 0.4574 | 390 | 1.1173 | 21553328 |
0.2023 | 0.4633 | 395 | 1.1160 | 21830856 |
0.1751 | 0.4691 | 400 | 1.1204 | 22119056 |
0.1812 | 0.4750 | 405 | 1.1187 | 22393880 |
0.1914 | 0.4809 | 410 | 1.1171 | 22667272 |
0.2056 | 0.4867 | 415 | 1.1174 | 22944736 |
0.2147 | 0.4926 | 420 | 1.1167 | 23227016 |
0.3146 | 0.4985 | 425 | 1.1144 | 23504144 |
0.214 | 0.5043 | 430 | 1.1148 | 23779472 |
0.2443 | 0.5102 | 435 | 1.1143 | 24059400 |
0.2042 | 0.5161 | 440 | 1.1142 | 24337672 |
0.1774 | 0.5219 | 445 | 1.1136 | 24610616 |
0.2231 | 0.5278 | 450 | 1.1133 | 24893064 |
0.2154 | 0.5336 | 455 | 1.1137 | 25168152 |
0.2084 | 0.5395 | 460 | 1.1148 | 25443848 |
0.2072 | 0.5454 | 465 | 1.1121 | 25721560 |
0.1912 | 0.5512 | 470 | 1.1107 | 25994480 |
0.2178 | 0.5571 | 475 | 1.1130 | 26261104 |
0.1787 | 0.5630 | 480 | 1.1112 | 26537640 |
0.3052 | 0.5688 | 485 | 1.1110 | 26816312 |
0.2065 | 0.5747 | 490 | 1.1133 | 27096072 |
0.1746 | 0.5806 | 495 | 1.1089 | 27368392 |
0.1599 | 0.5864 | 500 | 1.1101 | 27645120 |
0.2903 | 0.5923 | 505 | 1.1103 | 27924664 |
0.1574 | 0.5982 | 510 | 1.1077 | 28206072 |
0.2289 | 0.6040 | 515 | 1.1112 | 28477632 |
0.2916 | 0.6099 | 520 | 1.1101 | 28758032 |
0.2304 | 0.6157 | 525 | 1.1078 | 29036376 |
0.2007 | 0.6216 | 530 | 1.1073 | 29313576 |
0.189 | 0.6275 | 535 | 1.1087 | 29588168 |
0.2012 | 0.6333 | 540 | 1.1064 | 29862896 |
0.2344 | 0.6392 | 545 | 1.1068 | 30142696 |
0.142 | 0.6451 | 550 | 1.1092 | 30415440 |
0.187 | 0.6509 | 555 | 1.1066 | 30687944 |
0.2094 | 0.6568 | 560 | 1.1076 | 30963280 |
0.2935 | 0.6627 | 565 | 1.1063 | 31237408 |
0.2056 | 0.6685 | 570 | 1.1054 | 31515568 |
0.1976 | 0.6744 | 575 | 1.1063 | 31785880 |
0.2251 | 0.6803 | 580 | 1.1062 | 32069232 |
0.3552 | 0.6861 | 585 | 1.1065 | 32342456 |
0.1912 | 0.6920 | 590 | 1.1074 | 32610824 |
0.2423 | 0.6978 | 595 | 1.1049 | 32896728 |
0.1719 | 0.7037 | 600 | 1.1034 | 33172488 |
0.162 | 0.7096 | 605 | 1.1054 | 33445888 |
0.1883 | 0.7154 | 610 | 1.1056 | 33724592 |
0.1962 | 0.7213 | 615 | 1.1041 | 34008736 |
0.3558 | 0.7272 | 620 | 1.1036 | 34284000 |
0.0859 | 0.7330 | 625 | 1.1049 | 34566152 |
0.2116 | 0.7389 | 630 | 1.1060 | 34836984 |
0.2408 | 0.7448 | 635 | 1.1046 | 35111192 |
0.1857 | 0.7506 | 640 | 1.1019 | 35383904 |
0.2476 | 0.7565 | 645 | 1.1014 | 35667144 |
0.1967 | 0.7624 | 650 | 1.1032 | 35947664 |
0.1636 | 0.7682 | 655 | 1.1024 | 36224040 |
0.2297 | 0.7741 | 660 | 1.1021 | 36506760 |
0.1982 | 0.7799 | 665 | 1.1027 | 36788272 |
0.1771 | 0.7858 | 670 | 1.1020 | 37063152 |
0.1424 | 0.7917 | 675 | 1.1010 | 37341552 |
0.2286 | 0.7975 | 680 | 1.1002 | 37620824 |
0.2375 | 0.8034 | 685 | 1.1000 | 37901128 |
0.177 | 0.8093 | 690 | 1.1015 | 38177616 |
0.2024 | 0.8151 | 695 | 1.1017 | 38456216 |
0.2342 | 0.8210 | 700 | 1.0998 | 38736312 |
0.2695 | 0.8269 | 705 | 1.1008 | 39007952 |
0.224 | 0.8327 | 710 | 1.1006 | 39285064 |
0.1958 | 0.8386 | 715 | 1.1003 | 39558848 |
0.1531 | 0.8445 | 720 | 1.1023 | 39838384 |
0.1274 | 0.8503 | 725 | 1.0987 | 40116400 |
0.2094 | 0.8562 | 730 | 1.0985 | 40393288 |
0.2383 | 0.8620 | 735 | 1.0998 | 40676008 |
0.1961 | 0.8679 | 740 | 1.0985 | 40957096 |
0.162 | 0.8738 | 745 | 1.0987 | 41235096 |
0.1646 | 0.8796 | 750 | 1.1000 | 41507944 |
0.1752 | 0.8855 | 755 | 1.0999 | 41781824 |
0.1349 | 0.8914 | 760 | 1.0986 | 42052440 |
0.1325 | 0.8972 | 765 | 1.0972 | 42327280 |
0.2107 | 0.9031 | 770 | 1.0982 | 42600664 |
0.1816 | 0.9090 | 775 | 1.1000 | 42872616 |
0.1629 | 0.9148 | 780 | 1.0993 | 43149280 |
0.2662 | 0.9207 | 785 | 1.0968 | 43426168 |
0.1063 | 0.9266 | 790 | 1.0965 | 43708912 |
0.3161 | 0.9324 | 795 | 1.0972 | 43985776 |
0.1565 | 0.9383 | 800 | 1.0962 | 44267560 |
0.1986 | 0.9441 | 805 | 1.0966 | 44544272 |
0.2149 | 0.9500 | 810 | 1.0966 | 44820528 |
0.2001 | 0.9559 | 815 | 1.0957 | 45099560 |
0.1245 | 0.9617 | 820 | 1.0949 | 45379528 |
0.1656 | 0.9676 | 825 | 1.0948 | 45649552 |
0.1829 | 0.9735 | 830 | 1.0944 | 45924520 |
0.2247 | 0.9793 | 835 | 1.0945 | 46207456 |
0.1744 | 0.9852 | 840 | 1.0952 | 46486032 |
0.2166 | 0.9911 | 845 | 1.0949 | 46765376 |
0.1605 | 0.9969 | 850 | 1.0945 | 47050784 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter9_sftsd0
Base model
google/gemma-2-2b