collapse_gemma-2-2b_hs2_accumulate_iter16_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0961
- Num Input Tokens Seen: 83747264
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6977 | 0.0033 | 5 | 1.3904 | 267480 |
1.6231 | 0.0065 | 10 | 1.3822 | 546376 |
1.6393 | 0.0098 | 15 | 1.3541 | 813352 |
1.6115 | 0.0130 | 20 | 1.3159 | 1087440 |
1.5146 | 0.0163 | 25 | 1.2692 | 1360104 |
1.3811 | 0.0195 | 30 | 1.2380 | 1632280 |
1.3167 | 0.0228 | 35 | 1.2115 | 1905536 |
1.2215 | 0.0260 | 40 | 1.1934 | 2181576 |
1.0274 | 0.0293 | 45 | 1.2132 | 2457192 |
1.0111 | 0.0325 | 50 | 1.2332 | 2730072 |
0.7786 | 0.0358 | 55 | 1.2677 | 3004960 |
0.7048 | 0.0391 | 60 | 1.3238 | 3274448 |
0.496 | 0.0423 | 65 | 1.3393 | 3553520 |
0.4427 | 0.0456 | 70 | 1.3378 | 3830472 |
0.3767 | 0.0488 | 75 | 1.3143 | 4098552 |
0.2799 | 0.0521 | 80 | 1.2953 | 4374904 |
0.2824 | 0.0553 | 85 | 1.2533 | 4648400 |
0.2927 | 0.0586 | 90 | 1.2330 | 4921568 |
0.2484 | 0.0618 | 95 | 1.2310 | 5191352 |
0.212 | 0.0651 | 100 | 1.2252 | 5452360 |
0.2664 | 0.0683 | 105 | 1.2230 | 5734152 |
0.21 | 0.0716 | 110 | 1.2056 | 6019488 |
0.1515 | 0.0748 | 115 | 1.2195 | 6292920 |
0.1698 | 0.0781 | 120 | 1.1982 | 6569024 |
0.2285 | 0.0814 | 125 | 1.1998 | 6836296 |
0.1607 | 0.0846 | 130 | 1.1986 | 7114480 |
0.2257 | 0.0879 | 135 | 1.1897 | 7387936 |
0.2385 | 0.0911 | 140 | 1.1908 | 7658304 |
0.1968 | 0.0944 | 145 | 1.1875 | 7928344 |
0.217 | 0.0976 | 150 | 1.1902 | 8196496 |
0.2652 | 0.1009 | 155 | 1.1856 | 8464072 |
0.1832 | 0.1041 | 160 | 1.1975 | 8742552 |
0.2104 | 0.1074 | 165 | 1.1769 | 9017192 |
0.16 | 0.1106 | 170 | 1.1837 | 9294488 |
0.1437 | 0.1139 | 175 | 1.1956 | 9570952 |
0.1413 | 0.1172 | 180 | 1.1816 | 9844568 |
0.1653 | 0.1204 | 185 | 1.1791 | 10121192 |
0.2069 | 0.1237 | 190 | 1.1849 | 10398000 |
0.26 | 0.1269 | 195 | 1.1766 | 10676048 |
0.115 | 0.1302 | 200 | 1.1750 | 10949568 |
0.2052 | 0.1334 | 205 | 1.1775 | 11223000 |
0.1257 | 0.1367 | 210 | 1.1725 | 11488904 |
0.1633 | 0.1399 | 215 | 1.1731 | 11764936 |
0.1164 | 0.1432 | 220 | 1.1728 | 12031792 |
0.1808 | 0.1464 | 225 | 1.1705 | 12298592 |
0.17 | 0.1497 | 230 | 1.1652 | 12575568 |
0.1615 | 0.1530 | 235 | 1.1685 | 12847744 |
0.1873 | 0.1562 | 240 | 1.1643 | 13114216 |
0.1351 | 0.1595 | 245 | 1.1633 | 13386824 |
0.1833 | 0.1627 | 250 | 1.1656 | 13667304 |
0.1678 | 0.1660 | 255 | 1.1616 | 13942040 |
0.2164 | 0.1692 | 260 | 1.1643 | 14213856 |
0.1614 | 0.1725 | 265 | 1.1635 | 14492648 |
0.201 | 0.1757 | 270 | 1.1574 | 14769328 |
0.1575 | 0.1790 | 275 | 1.1583 | 15040880 |
0.0709 | 0.1822 | 280 | 1.1596 | 15314256 |
0.1426 | 0.1855 | 285 | 1.1614 | 15585080 |
0.2168 | 0.1887 | 290 | 1.1594 | 15853120 |
0.1492 | 0.1920 | 295 | 1.1553 | 16121840 |
0.1689 | 0.1953 | 300 | 1.1552 | 16396264 |
0.1257 | 0.1985 | 305 | 1.1585 | 16674688 |
0.1253 | 0.2018 | 310 | 1.1526 | 16942584 |
0.2038 | 0.2050 | 315 | 1.1523 | 17210344 |
0.1456 | 0.2083 | 320 | 1.1586 | 17482832 |
0.2111 | 0.2115 | 325 | 1.1511 | 17757680 |
0.1601 | 0.2148 | 330 | 1.1514 | 18025312 |
0.1497 | 0.2180 | 335 | 1.1558 | 18293584 |
0.1666 | 0.2213 | 340 | 1.1546 | 18564456 |
0.1549 | 0.2245 | 345 | 1.1534 | 18837520 |
0.1407 | 0.2278 | 350 | 1.1475 | 19109768 |
0.152 | 0.2311 | 355 | 1.1481 | 19378528 |
0.1271 | 0.2343 | 360 | 1.1508 | 19645976 |
0.2376 | 0.2376 | 365 | 1.1506 | 19919384 |
0.1676 | 0.2408 | 370 | 1.1442 | 20190048 |
0.1521 | 0.2441 | 375 | 1.1461 | 20460528 |
0.1239 | 0.2473 | 380 | 1.1461 | 20731456 |
0.1187 | 0.2506 | 385 | 1.1431 | 21003208 |
0.125 | 0.2538 | 390 | 1.1477 | 21279600 |
0.0886 | 0.2571 | 395 | 1.1464 | 21551976 |
0.1044 | 0.2603 | 400 | 1.1446 | 21828520 |
0.0939 | 0.2636 | 405 | 1.1426 | 22087472 |
0.2185 | 0.2669 | 410 | 1.1426 | 22361984 |
0.1973 | 0.2701 | 415 | 1.1423 | 22634528 |
0.1189 | 0.2734 | 420 | 1.1407 | 22903592 |
0.1874 | 0.2766 | 425 | 1.1458 | 23182968 |
0.1436 | 0.2799 | 430 | 1.1444 | 23457368 |
0.1013 | 0.2831 | 435 | 1.1400 | 23728672 |
0.0814 | 0.2864 | 440 | 1.1409 | 23998096 |
0.1595 | 0.2896 | 445 | 1.1373 | 24274528 |
0.1271 | 0.2929 | 450 | 1.1355 | 24547816 |
0.2012 | 0.2961 | 455 | 1.1403 | 24819280 |
0.0965 | 0.2994 | 460 | 1.1385 | 25093256 |
0.1874 | 0.3026 | 465 | 1.1337 | 25361488 |
0.1178 | 0.3059 | 470 | 1.1363 | 25637312 |
0.1534 | 0.3092 | 475 | 1.1386 | 25912704 |
0.1469 | 0.3124 | 480 | 1.1361 | 26190688 |
0.1061 | 0.3157 | 485 | 1.1348 | 26462392 |
0.1136 | 0.3189 | 490 | 1.1371 | 26736472 |
0.1616 | 0.3222 | 495 | 1.1345 | 27005736 |
0.174 | 0.3254 | 500 | 1.1309 | 27275144 |
0.1533 | 0.3287 | 505 | 1.1327 | 27547808 |
0.0785 | 0.3319 | 510 | 1.1337 | 27822136 |
0.1492 | 0.3352 | 515 | 1.1340 | 28094008 |
0.0934 | 0.3384 | 520 | 1.1332 | 28363160 |
0.0926 | 0.3417 | 525 | 1.1333 | 28631816 |
0.1622 | 0.3450 | 530 | 1.1336 | 28902896 |
0.2129 | 0.3482 | 535 | 1.1293 | 29169464 |
0.1444 | 0.3515 | 540 | 1.1309 | 29439240 |
0.1702 | 0.3547 | 545 | 1.1345 | 29716280 |
0.1225 | 0.3580 | 550 | 1.1321 | 29993952 |
0.1474 | 0.3612 | 555 | 1.1290 | 30256456 |
0.1054 | 0.3645 | 560 | 1.1285 | 30529464 |
0.1311 | 0.3677 | 565 | 1.1304 | 30797176 |
0.122 | 0.3710 | 570 | 1.1283 | 31069896 |
0.1272 | 0.3742 | 575 | 1.1275 | 31347944 |
0.1057 | 0.3775 | 580 | 1.1261 | 31616208 |
0.1029 | 0.3808 | 585 | 1.1262 | 31883784 |
0.1004 | 0.3840 | 590 | 1.1285 | 32152184 |
0.1192 | 0.3873 | 595 | 1.1288 | 32428272 |
0.1002 | 0.3905 | 600 | 1.1267 | 32695056 |
0.1983 | 0.3938 | 605 | 1.1244 | 32972776 |
0.1104 | 0.3970 | 610 | 1.1253 | 33238272 |
0.1996 | 0.4003 | 615 | 1.1245 | 33516992 |
0.197 | 0.4035 | 620 | 1.1221 | 33794448 |
0.1456 | 0.4068 | 625 | 1.1266 | 34070520 |
0.1281 | 0.4100 | 630 | 1.1253 | 34348056 |
0.131 | 0.4133 | 635 | 1.1230 | 34619448 |
0.1238 | 0.4165 | 640 | 1.1234 | 34888632 |
0.1576 | 0.4198 | 645 | 1.1227 | 35157256 |
0.1504 | 0.4231 | 650 | 1.1218 | 35431744 |
0.0859 | 0.4263 | 655 | 1.1240 | 35705256 |
0.1568 | 0.4296 | 660 | 1.1229 | 35982960 |
0.0974 | 0.4328 | 665 | 1.1228 | 36257560 |
0.1669 | 0.4361 | 670 | 1.1220 | 36529088 |
0.1412 | 0.4393 | 675 | 1.1219 | 36800584 |
0.1568 | 0.4426 | 680 | 1.1222 | 37066896 |
0.1134 | 0.4458 | 685 | 1.1230 | 37331944 |
0.1331 | 0.4491 | 690 | 1.1205 | 37605720 |
0.1193 | 0.4523 | 695 | 1.1221 | 37879216 |
0.1605 | 0.4556 | 700 | 1.1220 | 38152288 |
0.1477 | 0.4589 | 705 | 1.1202 | 38424752 |
0.1072 | 0.4621 | 710 | 1.1213 | 38699024 |
0.1054 | 0.4654 | 715 | 1.1217 | 38977888 |
0.1341 | 0.4686 | 720 | 1.1184 | 39244072 |
0.1506 | 0.4719 | 725 | 1.1204 | 39515728 |
0.0906 | 0.4751 | 730 | 1.1234 | 39783920 |
0.112 | 0.4784 | 735 | 1.1203 | 40057912 |
0.1191 | 0.4816 | 740 | 1.1187 | 40325856 |
0.1385 | 0.4849 | 745 | 1.1202 | 40595664 |
0.1596 | 0.4881 | 750 | 1.1200 | 40866448 |
0.1369 | 0.4914 | 755 | 1.1197 | 41139656 |
0.1527 | 0.4947 | 760 | 1.1191 | 41413096 |
0.1858 | 0.4979 | 765 | 1.1184 | 41686816 |
0.1657 | 0.5012 | 770 | 1.1188 | 41960952 |
0.1361 | 0.5044 | 775 | 1.1186 | 42233320 |
0.1111 | 0.5077 | 780 | 1.1170 | 42512688 |
0.2046 | 0.5109 | 785 | 1.1156 | 42786608 |
0.1296 | 0.5142 | 790 | 1.1167 | 43056424 |
0.1377 | 0.5174 | 795 | 1.1153 | 43325520 |
0.227 | 0.5207 | 800 | 1.1177 | 43597816 |
0.1783 | 0.5239 | 805 | 1.1181 | 43872768 |
0.0828 | 0.5272 | 810 | 1.1158 | 44144856 |
0.1561 | 0.5304 | 815 | 1.1144 | 44415728 |
0.0962 | 0.5337 | 820 | 1.1154 | 44686760 |
0.1132 | 0.5370 | 825 | 1.1163 | 44961368 |
0.1411 | 0.5402 | 830 | 1.1150 | 45234936 |
0.1462 | 0.5435 | 835 | 1.1130 | 45505456 |
0.1142 | 0.5467 | 840 | 1.1154 | 45785136 |
0.1396 | 0.5500 | 845 | 1.1148 | 46059216 |
0.1182 | 0.5532 | 850 | 1.1146 | 46337200 |
0.09 | 0.5565 | 855 | 1.1149 | 46609168 |
0.1277 | 0.5597 | 860 | 1.1153 | 46883312 |
0.1059 | 0.5630 | 865 | 1.1146 | 47154656 |
0.1286 | 0.5662 | 870 | 1.1166 | 47427296 |
0.1492 | 0.5695 | 875 | 1.1149 | 47706016 |
0.081 | 0.5728 | 880 | 1.1137 | 47982128 |
0.1817 | 0.5760 | 885 | 1.1153 | 48257272 |
0.1306 | 0.5793 | 890 | 1.1138 | 48521920 |
0.0595 | 0.5825 | 895 | 1.1127 | 48793744 |
0.1367 | 0.5858 | 900 | 1.1134 | 49061760 |
0.1132 | 0.5890 | 905 | 1.1127 | 49332680 |
0.1874 | 0.5923 | 910 | 1.1120 | 49606352 |
0.1164 | 0.5955 | 915 | 1.1140 | 49880296 |
0.1582 | 0.5988 | 920 | 1.1158 | 50155816 |
0.1034 | 0.6020 | 925 | 1.1124 | 50429424 |
0.1633 | 0.6053 | 930 | 1.1120 | 50702816 |
0.1397 | 0.6086 | 935 | 1.1123 | 50973880 |
0.1896 | 0.6118 | 940 | 1.1106 | 51246448 |
0.1491 | 0.6151 | 945 | 1.1107 | 51525888 |
0.1698 | 0.6183 | 950 | 1.1122 | 51800344 |
0.17 | 0.6216 | 955 | 1.1121 | 52075368 |
0.1333 | 0.6248 | 960 | 1.1113 | 52345416 |
0.1389 | 0.6281 | 965 | 1.1111 | 52618784 |
0.0914 | 0.6313 | 970 | 1.1103 | 52892760 |
0.189 | 0.6346 | 975 | 1.1104 | 53163768 |
0.1022 | 0.6378 | 980 | 1.1123 | 53443080 |
0.1381 | 0.6411 | 985 | 1.1125 | 53712248 |
0.1852 | 0.6443 | 990 | 1.1109 | 53981464 |
0.0943 | 0.6476 | 995 | 1.1101 | 54257952 |
0.1065 | 0.6509 | 1000 | 1.1105 | 54524264 |
0.064 | 0.6541 | 1005 | 1.1095 | 54797736 |
0.106 | 0.6574 | 1010 | 1.1092 | 55071296 |
0.1678 | 0.6606 | 1015 | 1.1108 | 55347736 |
0.1661 | 0.6639 | 1020 | 1.1092 | 55626920 |
0.1316 | 0.6671 | 1025 | 1.1080 | 55898728 |
0.1826 | 0.6704 | 1030 | 1.1097 | 56182624 |
0.0788 | 0.6736 | 1035 | 1.1093 | 56457992 |
0.19 | 0.6769 | 1040 | 1.1098 | 56724296 |
0.188 | 0.6801 | 1045 | 1.1077 | 56989184 |
0.1385 | 0.6834 | 1050 | 1.1075 | 57257216 |
0.1269 | 0.6867 | 1055 | 1.1073 | 57526920 |
0.1571 | 0.6899 | 1060 | 1.1077 | 57795592 |
0.1011 | 0.6932 | 1065 | 1.1082 | 58071336 |
0.1271 | 0.6964 | 1070 | 1.1060 | 58339632 |
0.1742 | 0.6997 | 1075 | 1.1058 | 58608896 |
0.1234 | 0.7029 | 1080 | 1.1078 | 58888352 |
0.1118 | 0.7062 | 1085 | 1.1100 | 59152664 |
0.1259 | 0.7094 | 1090 | 1.1091 | 59419984 |
0.0922 | 0.7127 | 1095 | 1.1060 | 59692648 |
0.1722 | 0.7159 | 1100 | 1.1057 | 59958744 |
0.1636 | 0.7192 | 1105 | 1.1064 | 60235568 |
0.1533 | 0.7225 | 1110 | 1.1066 | 60514536 |
0.127 | 0.7257 | 1115 | 1.1062 | 60798376 |
0.0895 | 0.7290 | 1120 | 1.1069 | 61067824 |
0.1614 | 0.7322 | 1125 | 1.1064 | 61338640 |
0.1656 | 0.7355 | 1130 | 1.1046 | 61612920 |
0.0904 | 0.7387 | 1135 | 1.1031 | 61890776 |
0.1511 | 0.7420 | 1140 | 1.1071 | 62162384 |
0.1038 | 0.7452 | 1145 | 1.1090 | 62436328 |
0.1673 | 0.7485 | 1150 | 1.1046 | 62715208 |
0.0784 | 0.7517 | 1155 | 1.1047 | 62979656 |
0.1476 | 0.7550 | 1160 | 1.1067 | 63245792 |
0.0983 | 0.7582 | 1165 | 1.1055 | 63522624 |
0.13 | 0.7615 | 1170 | 1.1043 | 63801704 |
0.0637 | 0.7648 | 1175 | 1.1045 | 64069376 |
0.1381 | 0.7680 | 1180 | 1.1050 | 64334200 |
0.0887 | 0.7713 | 1185 | 1.1055 | 64608848 |
0.1741 | 0.7745 | 1190 | 1.1063 | 64880872 |
0.246 | 0.7778 | 1195 | 1.1053 | 65154472 |
0.1015 | 0.7810 | 1200 | 1.1041 | 65430448 |
0.1879 | 0.7843 | 1205 | 1.1035 | 65709944 |
0.078 | 0.7875 | 1210 | 1.1042 | 65972992 |
0.1773 | 0.7908 | 1215 | 1.1039 | 66249816 |
0.1293 | 0.7940 | 1220 | 1.1030 | 66519368 |
0.1672 | 0.7973 | 1225 | 1.1028 | 66792112 |
0.1473 | 0.8006 | 1230 | 1.1032 | 67068496 |
0.112 | 0.8038 | 1235 | 1.1043 | 67343696 |
0.1381 | 0.8071 | 1240 | 1.1048 | 67617168 |
0.1389 | 0.8103 | 1245 | 1.1028 | 67887816 |
0.1471 | 0.8136 | 1250 | 1.1019 | 68169056 |
0.1123 | 0.8168 | 1255 | 1.1049 | 68437208 |
0.0895 | 0.8201 | 1260 | 1.1037 | 68708704 |
0.1086 | 0.8233 | 1265 | 1.1023 | 68984064 |
0.0925 | 0.8266 | 1270 | 1.1035 | 69258072 |
0.1126 | 0.8298 | 1275 | 1.1044 | 69540496 |
0.1242 | 0.8331 | 1280 | 1.1022 | 69807992 |
0.1214 | 0.8364 | 1285 | 1.1014 | 70085288 |
0.0738 | 0.8396 | 1290 | 1.1027 | 70355512 |
0.1319 | 0.8429 | 1295 | 1.1035 | 70622344 |
0.0871 | 0.8461 | 1300 | 1.1021 | 70900592 |
0.1327 | 0.8494 | 1305 | 1.1024 | 71163480 |
0.2031 | 0.8526 | 1310 | 1.1023 | 71440296 |
0.1485 | 0.8559 | 1315 | 1.1005 | 71708720 |
0.1236 | 0.8591 | 1320 | 1.1013 | 71972552 |
0.1006 | 0.8624 | 1325 | 1.1020 | 72231184 |
0.1416 | 0.8656 | 1330 | 1.1024 | 72496672 |
0.1379 | 0.8689 | 1335 | 1.1000 | 72764816 |
0.147 | 0.8721 | 1340 | 1.0979 | 73037760 |
0.1148 | 0.8754 | 1345 | 1.1011 | 73304008 |
0.1496 | 0.8787 | 1350 | 1.1011 | 73575048 |
0.1097 | 0.8819 | 1355 | 1.0993 | 73861176 |
0.1041 | 0.8852 | 1360 | 1.0999 | 74132272 |
0.1422 | 0.8884 | 1365 | 1.1005 | 74408816 |
0.1453 | 0.8917 | 1370 | 1.1010 | 74678792 |
0.1158 | 0.8949 | 1375 | 1.1030 | 74953160 |
0.169 | 0.8982 | 1380 | 1.1010 | 75228000 |
0.1493 | 0.9014 | 1385 | 1.1008 | 75506472 |
0.1169 | 0.9047 | 1390 | 1.1016 | 75775120 |
0.1323 | 0.9079 | 1395 | 1.1014 | 76047336 |
0.1254 | 0.9112 | 1400 | 1.0992 | 76319824 |
0.128 | 0.9145 | 1405 | 1.1008 | 76595032 |
0.1146 | 0.9177 | 1410 | 1.1022 | 76868576 |
0.0559 | 0.9210 | 1415 | 1.0989 | 77152104 |
0.0976 | 0.9242 | 1420 | 1.0992 | 77425528 |
0.0594 | 0.9275 | 1425 | 1.1008 | 77704232 |
0.1184 | 0.9307 | 1430 | 1.1014 | 77968744 |
0.1228 | 0.9340 | 1435 | 1.1005 | 78238920 |
0.0972 | 0.9372 | 1440 | 1.0995 | 78511992 |
0.1072 | 0.9405 | 1445 | 1.1005 | 78789384 |
0.171 | 0.9437 | 1450 | 1.1021 | 79066328 |
0.1315 | 0.9470 | 1455 | 1.1015 | 79339152 |
0.1202 | 0.9503 | 1460 | 1.0999 | 79615856 |
0.1061 | 0.9535 | 1465 | 1.0991 | 79886016 |
0.1329 | 0.9568 | 1470 | 1.1005 | 80160064 |
0.1441 | 0.9600 | 1475 | 1.1021 | 80433496 |
0.1115 | 0.9633 | 1480 | 1.1006 | 80704024 |
0.141 | 0.9665 | 1485 | 1.0980 | 80980320 |
0.1129 | 0.9698 | 1490 | 1.0995 | 81257896 |
0.1392 | 0.9730 | 1495 | 1.0994 | 81526496 |
0.1225 | 0.9763 | 1500 | 1.0996 | 81793848 |
0.0584 | 0.9795 | 1505 | 1.0993 | 82064296 |
0.1258 | 0.9828 | 1510 | 1.0992 | 82337664 |
0.1247 | 0.9860 | 1515 | 1.0984 | 82611632 |
0.1923 | 0.9893 | 1520 | 1.0968 | 82886368 |
0.1801 | 0.9926 | 1525 | 1.0976 | 83157384 |
0.1305 | 0.9958 | 1530 | 1.0976 | 83419032 |
0.1318 | 0.9991 | 1535 | 1.0958 | 83692664 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 8
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter16_sftsd0
Base model
google/gemma-2-2b