collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1119
- Num Input Tokens Seen: 46755704
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3956 | 0 |
1.7072 | 0.0059 | 5 | 1.3922 | 267024 |
1.5511 | 0.0117 | 10 | 1.3635 | 548560 |
1.5199 | 0.0176 | 15 | 1.2986 | 830488 |
1.4321 | 0.0234 | 20 | 1.2487 | 1101904 |
1.4119 | 0.0293 | 25 | 1.2016 | 1377928 |
1.2848 | 0.0351 | 30 | 1.1721 | 1661264 |
1.2144 | 0.0410 | 35 | 1.1683 | 1943544 |
1.1417 | 0.0468 | 40 | 1.1494 | 2221200 |
0.9487 | 0.0527 | 45 | 1.1817 | 2498336 |
0.9135 | 0.0585 | 50 | 1.1897 | 2763296 |
0.9445 | 0.0644 | 55 | 1.2065 | 3036752 |
0.8185 | 0.0702 | 60 | 1.2143 | 3310768 |
0.606 | 0.0761 | 65 | 1.2347 | 3581304 |
0.7169 | 0.0819 | 70 | 1.2386 | 3866600 |
0.6866 | 0.0878 | 75 | 1.2317 | 4142056 |
0.6366 | 0.0936 | 80 | 1.2168 | 4416216 |
0.5326 | 0.0995 | 85 | 1.2149 | 4686560 |
0.4364 | 0.1053 | 90 | 1.2276 | 4959768 |
0.4029 | 0.1112 | 95 | 1.2208 | 5230968 |
0.3969 | 0.1170 | 100 | 1.2249 | 5504384 |
0.4026 | 0.1229 | 105 | 1.2208 | 5775920 |
0.4528 | 0.1287 | 110 | 1.2238 | 6049704 |
0.4096 | 0.1346 | 115 | 1.2106 | 6327216 |
0.3988 | 0.1404 | 120 | 1.2170 | 6601008 |
0.4273 | 0.1463 | 125 | 1.2074 | 6877208 |
0.3648 | 0.1521 | 130 | 1.2093 | 7155944 |
0.282 | 0.1580 | 135 | 1.1978 | 7432256 |
0.3538 | 0.1638 | 140 | 1.2051 | 7705992 |
0.4239 | 0.1697 | 145 | 1.1951 | 7982024 |
0.4044 | 0.1755 | 150 | 1.2000 | 8253584 |
0.4297 | 0.1814 | 155 | 1.2035 | 8534352 |
0.2586 | 0.1872 | 160 | 1.1974 | 8795744 |
0.2682 | 0.1931 | 165 | 1.2044 | 9068272 |
0.3477 | 0.1989 | 170 | 1.1952 | 9346008 |
0.3633 | 0.2048 | 175 | 1.1954 | 9614616 |
0.3786 | 0.2106 | 180 | 1.1975 | 9889768 |
0.312 | 0.2165 | 185 | 1.1918 | 10167624 |
0.3204 | 0.2223 | 190 | 1.1910 | 10437856 |
0.3476 | 0.2282 | 195 | 1.1900 | 10712832 |
0.2801 | 0.2340 | 200 | 1.1882 | 10977528 |
0.2675 | 0.2399 | 205 | 1.1885 | 11245504 |
0.2818 | 0.2457 | 210 | 1.1840 | 11514312 |
0.2689 | 0.2516 | 215 | 1.1851 | 11793000 |
0.3491 | 0.2574 | 220 | 1.1872 | 12068576 |
0.3424 | 0.2633 | 225 | 1.1802 | 12342256 |
0.2694 | 0.2691 | 230 | 1.1810 | 12614032 |
0.4132 | 0.2750 | 235 | 1.1728 | 12882712 |
0.2893 | 0.2808 | 240 | 1.1700 | 13149128 |
0.2847 | 0.2867 | 245 | 1.1856 | 13421176 |
0.3198 | 0.2925 | 250 | 1.1693 | 13696120 |
0.2038 | 0.2984 | 255 | 1.1743 | 13965256 |
0.222 | 0.3042 | 260 | 1.1832 | 14243792 |
0.244 | 0.3101 | 265 | 1.1692 | 14524248 |
0.3439 | 0.3159 | 270 | 1.1722 | 14805296 |
0.2316 | 0.3218 | 275 | 1.1698 | 15078480 |
0.2024 | 0.3276 | 280 | 1.1734 | 15353592 |
0.2288 | 0.3335 | 285 | 1.1696 | 15628632 |
0.2868 | 0.3393 | 290 | 1.1661 | 15902808 |
0.3403 | 0.3452 | 295 | 1.1693 | 16179792 |
0.3238 | 0.3510 | 300 | 1.1663 | 16456880 |
0.236 | 0.3569 | 305 | 1.1625 | 16734104 |
0.1991 | 0.3627 | 310 | 1.1644 | 17008776 |
0.1729 | 0.3686 | 315 | 1.1646 | 17278024 |
0.2047 | 0.3744 | 320 | 1.1630 | 17550640 |
0.2911 | 0.3803 | 325 | 1.1582 | 17826960 |
0.1639 | 0.3861 | 330 | 1.1703 | 18098336 |
0.1956 | 0.3920 | 335 | 1.1660 | 18370416 |
0.2335 | 0.3978 | 340 | 1.1550 | 18640240 |
0.3123 | 0.4037 | 345 | 1.1614 | 18915192 |
0.2137 | 0.4095 | 350 | 1.1581 | 19182344 |
0.2683 | 0.4154 | 355 | 1.1541 | 19465576 |
0.2263 | 0.4212 | 360 | 1.1560 | 19743312 |
0.1861 | 0.4271 | 365 | 1.1590 | 20020896 |
0.2883 | 0.4329 | 370 | 1.1546 | 20294232 |
0.1755 | 0.4388 | 375 | 1.1525 | 20559040 |
0.213 | 0.4446 | 380 | 1.1534 | 20822032 |
0.1859 | 0.4505 | 385 | 1.1523 | 21099560 |
0.2529 | 0.4563 | 390 | 1.1537 | 21368144 |
0.242 | 0.4622 | 395 | 1.1498 | 21645832 |
0.1993 | 0.4680 | 400 | 1.1491 | 21924544 |
0.1637 | 0.4739 | 405 | 1.1509 | 22199720 |
0.1812 | 0.4797 | 410 | 1.1441 | 22477384 |
0.2141 | 0.4856 | 415 | 1.1454 | 22750888 |
0.2874 | 0.4914 | 420 | 1.1489 | 23027632 |
0.1906 | 0.4973 | 425 | 1.1413 | 23308144 |
0.2803 | 0.5031 | 430 | 1.1433 | 23580088 |
0.2174 | 0.5090 | 435 | 1.1437 | 23854088 |
0.2305 | 0.5148 | 440 | 1.1424 | 24134544 |
0.2014 | 0.5207 | 445 | 1.1465 | 24403536 |
0.2768 | 0.5265 | 450 | 1.1414 | 24680664 |
0.214 | 0.5324 | 455 | 1.1408 | 24952280 |
0.3169 | 0.5382 | 460 | 1.1445 | 25231192 |
0.2731 | 0.5441 | 465 | 1.1393 | 25505768 |
0.2496 | 0.5499 | 470 | 1.1391 | 25785544 |
0.2666 | 0.5558 | 475 | 1.1404 | 26056328 |
0.1958 | 0.5616 | 480 | 1.1394 | 26331200 |
0.1935 | 0.5675 | 485 | 1.1375 | 26610448 |
0.1744 | 0.5734 | 490 | 1.1368 | 26883696 |
0.2562 | 0.5792 | 495 | 1.1336 | 27155344 |
0.218 | 0.5851 | 500 | 1.1342 | 27427808 |
0.2348 | 0.5909 | 505 | 1.1335 | 27705544 |
0.2619 | 0.5968 | 510 | 1.1323 | 27974816 |
0.1454 | 0.6026 | 515 | 1.1351 | 28241360 |
0.2899 | 0.6085 | 520 | 1.1348 | 28513256 |
0.28 | 0.6143 | 525 | 1.1300 | 28781072 |
0.2314 | 0.6202 | 530 | 1.1314 | 29051688 |
0.1742 | 0.6260 | 535 | 1.1375 | 29322136 |
0.2316 | 0.6319 | 540 | 1.1320 | 29591728 |
0.197 | 0.6377 | 545 | 1.1289 | 29865856 |
0.2103 | 0.6436 | 550 | 1.1322 | 30139496 |
0.2218 | 0.6494 | 555 | 1.1290 | 30416656 |
0.205 | 0.6553 | 560 | 1.1265 | 30696792 |
0.1418 | 0.6611 | 565 | 1.1287 | 30971528 |
0.2414 | 0.6670 | 570 | 1.1276 | 31244968 |
0.2306 | 0.6728 | 575 | 1.1258 | 31520232 |
0.2341 | 0.6787 | 580 | 1.1275 | 31795864 |
0.2402 | 0.6845 | 585 | 1.1262 | 32069624 |
0.2602 | 0.6904 | 590 | 1.1263 | 32337864 |
0.2421 | 0.6962 | 595 | 1.1266 | 32618672 |
0.1608 | 0.7021 | 600 | 1.1260 | 32898536 |
0.266 | 0.7079 | 605 | 1.1234 | 33168224 |
0.1589 | 0.7138 | 610 | 1.1262 | 33433136 |
0.1982 | 0.7196 | 615 | 1.1257 | 33712384 |
0.1458 | 0.7255 | 620 | 1.1258 | 33981912 |
0.2513 | 0.7313 | 625 | 1.1299 | 34249392 |
0.1416 | 0.7372 | 630 | 1.1239 | 34521488 |
0.2103 | 0.7430 | 635 | 1.1246 | 34794184 |
0.2409 | 0.7489 | 640 | 1.1256 | 35068416 |
0.2248 | 0.7547 | 645 | 1.1218 | 35341160 |
0.2517 | 0.7606 | 650 | 1.1225 | 35618656 |
0.2098 | 0.7664 | 655 | 1.1215 | 35892176 |
0.2069 | 0.7723 | 660 | 1.1203 | 36174472 |
0.1857 | 0.7781 | 665 | 1.1229 | 36439872 |
0.2552 | 0.7840 | 670 | 1.1202 | 36714872 |
0.1902 | 0.7898 | 675 | 1.1188 | 36987872 |
0.2204 | 0.7957 | 680 | 1.1201 | 37263224 |
0.3015 | 0.8015 | 685 | 1.1189 | 37536992 |
0.2118 | 0.8074 | 690 | 1.1192 | 37793976 |
0.2303 | 0.8132 | 695 | 1.1178 | 38068432 |
0.2148 | 0.8191 | 700 | 1.1194 | 38341616 |
0.2132 | 0.8249 | 705 | 1.1185 | 38610776 |
0.1463 | 0.8308 | 710 | 1.1194 | 38888584 |
0.1878 | 0.8366 | 715 | 1.1210 | 39160392 |
0.275 | 0.8425 | 720 | 1.1178 | 39426336 |
0.1686 | 0.8483 | 725 | 1.1164 | 39698280 |
0.1518 | 0.8542 | 730 | 1.1198 | 39967168 |
0.2153 | 0.8600 | 735 | 1.1186 | 40242904 |
0.22 | 0.8659 | 740 | 1.1163 | 40515024 |
0.2084 | 0.8717 | 745 | 1.1172 | 40786080 |
0.264 | 0.8776 | 750 | 1.1143 | 41059704 |
0.1918 | 0.8834 | 755 | 1.1147 | 41331008 |
0.2444 | 0.8893 | 760 | 1.1154 | 41603928 |
0.1433 | 0.8951 | 765 | 1.1158 | 41873784 |
0.2206 | 0.9010 | 770 | 1.1152 | 42140496 |
0.204 | 0.9068 | 775 | 1.1131 | 42415368 |
0.1427 | 0.9127 | 780 | 1.1143 | 42697792 |
0.2541 | 0.9185 | 785 | 1.1149 | 42976216 |
0.2033 | 0.9244 | 790 | 1.1160 | 43250816 |
0.1249 | 0.9302 | 795 | 1.1139 | 43531552 |
0.158 | 0.9361 | 800 | 1.1146 | 43811968 |
0.1552 | 0.9419 | 805 | 1.1154 | 44080032 |
0.1523 | 0.9478 | 810 | 1.1141 | 44351688 |
0.1709 | 0.9536 | 815 | 1.1129 | 44631208 |
0.133 | 0.9595 | 820 | 1.1129 | 44900816 |
0.2698 | 0.9653 | 825 | 1.1133 | 45173960 |
0.1856 | 0.9712 | 830 | 1.1131 | 45444920 |
0.2218 | 0.9770 | 835 | 1.1141 | 45715920 |
0.1803 | 0.9829 | 840 | 1.1164 | 45990080 |
0.2412 | 0.9887 | 845 | 1.1138 | 46264088 |
0.3314 | 0.9946 | 850 | 1.1103 | 46540904 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd0
Base model
google/gemma-2-2b