collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0940
- Num Input Tokens Seen: 41008256
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.622 | 0.0066 | 5 | 1.3875 | 269304 |
1.5034 | 0.0131 | 10 | 1.3529 | 535392 |
1.4652 | 0.0197 | 15 | 1.2845 | 801488 |
1.3824 | 0.0263 | 20 | 1.2330 | 1083632 |
1.3513 | 0.0329 | 25 | 1.1882 | 1355072 |
1.1303 | 0.0394 | 30 | 1.1821 | 1635432 |
0.9928 | 0.0460 | 35 | 1.1924 | 1894472 |
0.8215 | 0.0526 | 40 | 1.2128 | 2161232 |
0.8303 | 0.0592 | 45 | 1.2421 | 2428280 |
0.5895 | 0.0657 | 50 | 1.2467 | 2702640 |
0.5274 | 0.0723 | 55 | 1.2585 | 2973544 |
0.4315 | 0.0789 | 60 | 1.2433 | 3236296 |
0.4844 | 0.0855 | 65 | 1.2217 | 3506176 |
0.3115 | 0.0920 | 70 | 1.2198 | 3780160 |
0.3854 | 0.0986 | 75 | 1.2028 | 4051568 |
0.3065 | 0.1052 | 80 | 1.1925 | 4324928 |
0.3682 | 0.1118 | 85 | 1.1846 | 4593592 |
0.5041 | 0.1183 | 90 | 1.1806 | 4867408 |
0.2775 | 0.1249 | 95 | 1.1759 | 5128832 |
0.2909 | 0.1315 | 100 | 1.1737 | 5401472 |
0.3715 | 0.1381 | 105 | 1.1742 | 5673312 |
0.3444 | 0.1446 | 110 | 1.1667 | 5945400 |
0.3783 | 0.1512 | 115 | 1.1666 | 6217600 |
0.2508 | 0.1578 | 120 | 1.1635 | 6483312 |
0.2896 | 0.1644 | 125 | 1.1591 | 6757952 |
0.2647 | 0.1709 | 130 | 1.1586 | 7031456 |
0.1641 | 0.1775 | 135 | 1.1563 | 7296128 |
0.2283 | 0.1841 | 140 | 1.1550 | 7571176 |
0.2946 | 0.1906 | 145 | 1.1524 | 7847912 |
0.2922 | 0.1972 | 150 | 1.1484 | 8116960 |
0.2966 | 0.2038 | 155 | 1.1481 | 8393608 |
0.268 | 0.2104 | 160 | 1.1539 | 8663712 |
0.2847 | 0.2169 | 165 | 1.1498 | 8925096 |
0.2498 | 0.2235 | 170 | 1.1483 | 9194968 |
0.2431 | 0.2301 | 175 | 1.1496 | 9464256 |
0.2411 | 0.2367 | 180 | 1.1453 | 9727032 |
0.2876 | 0.2432 | 185 | 1.1429 | 9997984 |
0.3148 | 0.2498 | 190 | 1.1435 | 10271224 |
0.2655 | 0.2564 | 195 | 1.1408 | 10546488 |
0.2446 | 0.2630 | 200 | 1.1415 | 10805248 |
0.2493 | 0.2695 | 205 | 1.1428 | 11074256 |
0.2977 | 0.2761 | 210 | 1.1383 | 11346264 |
0.3008 | 0.2827 | 215 | 1.1380 | 11612816 |
0.212 | 0.2893 | 220 | 1.1349 | 11891040 |
0.2596 | 0.2958 | 225 | 1.1377 | 12163592 |
0.1793 | 0.3024 | 230 | 1.1370 | 12425752 |
0.248 | 0.3090 | 235 | 1.1325 | 12694640 |
0.2415 | 0.3156 | 240 | 1.1331 | 12963992 |
0.2047 | 0.3221 | 245 | 1.1319 | 13234768 |
0.1848 | 0.3287 | 250 | 1.1310 | 13511432 |
0.1624 | 0.3353 | 255 | 1.1309 | 13785032 |
0.2183 | 0.3419 | 260 | 1.1269 | 14052560 |
0.2079 | 0.3484 | 265 | 1.1321 | 14318664 |
0.1957 | 0.3550 | 270 | 1.1292 | 14591392 |
0.1832 | 0.3616 | 275 | 1.1273 | 14857944 |
0.2016 | 0.3681 | 280 | 1.1240 | 15133456 |
0.2329 | 0.3747 | 285 | 1.1258 | 15404048 |
0.2867 | 0.3813 | 290 | 1.1256 | 15674488 |
0.2546 | 0.3879 | 295 | 1.1245 | 15950072 |
0.2182 | 0.3944 | 300 | 1.1226 | 16211512 |
0.2931 | 0.4010 | 305 | 1.1222 | 16484192 |
0.2325 | 0.4076 | 310 | 1.1228 | 16754264 |
0.2637 | 0.4142 | 315 | 1.1211 | 17023608 |
0.1728 | 0.4207 | 320 | 1.1188 | 17305976 |
0.2263 | 0.4273 | 325 | 1.1195 | 17575456 |
0.2625 | 0.4339 | 330 | 1.1184 | 17840744 |
0.1631 | 0.4405 | 335 | 1.1177 | 18105176 |
0.1778 | 0.4470 | 340 | 1.1180 | 18369064 |
0.327 | 0.4536 | 345 | 1.1150 | 18635856 |
0.2488 | 0.4602 | 350 | 1.1160 | 18906504 |
0.2863 | 0.4668 | 355 | 1.1146 | 19171744 |
0.2554 | 0.4733 | 360 | 1.1152 | 19443216 |
0.2097 | 0.4799 | 365 | 1.1171 | 19710312 |
0.2428 | 0.4865 | 370 | 1.1147 | 19983280 |
0.1757 | 0.4931 | 375 | 1.1157 | 20253048 |
0.2844 | 0.4996 | 380 | 1.1143 | 20521536 |
0.2519 | 0.5062 | 385 | 1.1135 | 20793304 |
0.14 | 0.5128 | 390 | 1.1135 | 21056880 |
0.175 | 0.5194 | 395 | 1.1139 | 21322760 |
0.2719 | 0.5259 | 400 | 1.1138 | 21588632 |
0.2211 | 0.5325 | 405 | 1.1119 | 21863192 |
0.2711 | 0.5391 | 410 | 1.1115 | 22136640 |
0.2192 | 0.5456 | 415 | 1.1097 | 22400024 |
0.2555 | 0.5522 | 420 | 1.1088 | 22663600 |
0.2381 | 0.5588 | 425 | 1.1071 | 22931864 |
0.287 | 0.5654 | 430 | 1.1090 | 23211784 |
0.2197 | 0.5719 | 435 | 1.1079 | 23473528 |
0.1785 | 0.5785 | 440 | 1.1071 | 23741512 |
0.1782 | 0.5851 | 445 | 1.1088 | 24013864 |
0.1792 | 0.5917 | 450 | 1.1081 | 24283944 |
0.2492 | 0.5982 | 455 | 1.1053 | 24555032 |
0.2555 | 0.6048 | 460 | 1.1070 | 24818080 |
0.2014 | 0.6114 | 465 | 1.1091 | 25091208 |
0.1869 | 0.6180 | 470 | 1.1049 | 25354352 |
0.2532 | 0.6245 | 475 | 1.1049 | 25626256 |
0.2373 | 0.6311 | 480 | 1.1082 | 25900944 |
0.1992 | 0.6377 | 485 | 1.1064 | 26173568 |
0.2187 | 0.6443 | 490 | 1.1063 | 26447272 |
0.2218 | 0.6508 | 495 | 1.1089 | 26715952 |
0.2322 | 0.6574 | 500 | 1.1061 | 26983200 |
0.2482 | 0.6640 | 505 | 1.1060 | 27247440 |
0.1582 | 0.6706 | 510 | 1.1054 | 27515256 |
0.2757 | 0.6771 | 515 | 1.1051 | 27778344 |
0.1809 | 0.6837 | 520 | 1.1047 | 28049984 |
0.2369 | 0.6903 | 525 | 1.1042 | 28324744 |
0.2848 | 0.6969 | 530 | 1.1050 | 28589688 |
0.2827 | 0.7034 | 535 | 1.1021 | 28861280 |
0.2411 | 0.7100 | 540 | 1.1027 | 29129832 |
0.2118 | 0.7166 | 545 | 1.1020 | 29399128 |
0.1694 | 0.7231 | 550 | 1.1019 | 29669072 |
0.234 | 0.7297 | 555 | 1.1027 | 29932936 |
0.2118 | 0.7363 | 560 | 1.1031 | 30200984 |
0.2381 | 0.7429 | 565 | 1.1006 | 30467952 |
0.2596 | 0.7494 | 570 | 1.1016 | 30740152 |
0.2517 | 0.7560 | 575 | 1.1025 | 31013280 |
0.2295 | 0.7626 | 580 | 1.1009 | 31283736 |
0.2093 | 0.7692 | 585 | 1.1000 | 31546048 |
0.2714 | 0.7757 | 590 | 1.1016 | 31810008 |
0.1723 | 0.7823 | 595 | 1.0997 | 32082696 |
0.2339 | 0.7889 | 600 | 1.0983 | 32349272 |
0.2226 | 0.7955 | 605 | 1.0987 | 32617856 |
0.24 | 0.8020 | 610 | 1.0993 | 32890144 |
0.2459 | 0.8086 | 615 | 1.0978 | 33155616 |
0.2352 | 0.8152 | 620 | 1.0977 | 33421616 |
0.1846 | 0.8218 | 625 | 1.1003 | 33689760 |
0.1827 | 0.8283 | 630 | 1.0984 | 33954944 |
0.2186 | 0.8349 | 635 | 1.0991 | 34220096 |
0.1833 | 0.8415 | 640 | 1.1003 | 34487888 |
0.2651 | 0.8481 | 645 | 1.0984 | 34759656 |
0.2547 | 0.8546 | 650 | 1.0970 | 35032040 |
0.1985 | 0.8612 | 655 | 1.0965 | 35302816 |
0.2972 | 0.8678 | 660 | 1.0979 | 35576712 |
0.2817 | 0.8744 | 665 | 1.0956 | 35850400 |
0.2383 | 0.8809 | 670 | 1.0975 | 36121904 |
0.1814 | 0.8875 | 675 | 1.0993 | 36393368 |
0.2137 | 0.8941 | 680 | 1.0943 | 36664864 |
0.1752 | 0.9006 | 685 | 1.0941 | 36939200 |
0.2005 | 0.9072 | 690 | 1.0983 | 37205904 |
0.3429 | 0.9138 | 695 | 1.0970 | 37482984 |
0.2312 | 0.9204 | 700 | 1.0943 | 37755048 |
0.1952 | 0.9269 | 705 | 1.0958 | 38019952 |
0.2054 | 0.9335 | 710 | 1.0963 | 38291888 |
0.2247 | 0.9401 | 715 | 1.0958 | 38561640 |
0.1912 | 0.9467 | 720 | 1.0958 | 38835512 |
0.2334 | 0.9532 | 725 | 1.0964 | 39110024 |
0.1795 | 0.9598 | 730 | 1.0948 | 39382208 |
0.1963 | 0.9664 | 735 | 1.0946 | 39654856 |
0.2492 | 0.9730 | 740 | 1.0952 | 39930376 |
0.2831 | 0.9795 | 745 | 1.0927 | 40202200 |
0.2232 | 0.9861 | 750 | 1.0936 | 40469640 |
0.1724 | 0.9927 | 755 | 1.0955 | 40736256 |
0.2259 | 0.9993 | 760 | 1.0940 | 41008256 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1
Base model
google/gemma-2-2b