collapse_gemma-2-2b_hs2_accumulate_iter19_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1080
- Num Input Tokens Seen: 98464968
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.566 | 0.0027 | 5 | 1.3903 | 269240 |
1.6034 | 0.0055 | 10 | 1.3837 | 534920 |
1.5271 | 0.0082 | 15 | 1.3636 | 803440 |
1.5903 | 0.0109 | 20 | 1.3339 | 1075112 |
1.4414 | 0.0137 | 25 | 1.2941 | 1347136 |
1.3672 | 0.0164 | 30 | 1.2546 | 1608088 |
1.3486 | 0.0191 | 35 | 1.2329 | 1883384 |
1.2328 | 0.0219 | 40 | 1.2053 | 2147208 |
1.0687 | 0.0246 | 45 | 1.2000 | 2413576 |
1.0229 | 0.0274 | 50 | 1.2220 | 2678528 |
1.0549 | 0.0301 | 55 | 1.2386 | 2944424 |
0.8001 | 0.0328 | 60 | 1.2753 | 3212016 |
0.827 | 0.0356 | 65 | 1.2962 | 3480872 |
0.6043 | 0.0383 | 70 | 1.2797 | 3745240 |
0.5734 | 0.0410 | 75 | 1.3074 | 4008872 |
0.5398 | 0.0438 | 80 | 1.2996 | 4283816 |
0.3939 | 0.0465 | 85 | 1.2942 | 4560864 |
0.4721 | 0.0492 | 90 | 1.2616 | 4834608 |
0.3256 | 0.0520 | 95 | 1.2648 | 5110216 |
0.3049 | 0.0547 | 100 | 1.2537 | 5380104 |
0.3462 | 0.0574 | 105 | 1.2385 | 5655128 |
0.33 | 0.0602 | 110 | 1.2168 | 5924072 |
0.2754 | 0.0629 | 115 | 1.2392 | 6191600 |
0.2427 | 0.0657 | 120 | 1.2336 | 6461192 |
0.2005 | 0.0684 | 125 | 1.2499 | 6729224 |
0.2823 | 0.0711 | 130 | 1.2266 | 7001096 |
0.2541 | 0.0739 | 135 | 1.2273 | 7267936 |
0.2096 | 0.0766 | 140 | 1.2258 | 7534808 |
0.3232 | 0.0793 | 145 | 1.2267 | 7807176 |
0.1664 | 0.0821 | 150 | 1.2066 | 8075744 |
0.1768 | 0.0848 | 155 | 1.2219 | 8346440 |
0.295 | 0.0875 | 160 | 1.2019 | 8614872 |
0.2027 | 0.0903 | 165 | 1.2076 | 8884480 |
0.1438 | 0.0930 | 170 | 1.2002 | 9150992 |
0.2161 | 0.0957 | 175 | 1.2000 | 9419696 |
0.152 | 0.0985 | 180 | 1.2019 | 9687968 |
0.1512 | 0.1012 | 185 | 1.2031 | 9960504 |
0.2432 | 0.1039 | 190 | 1.1993 | 10226056 |
0.1829 | 0.1067 | 195 | 1.2022 | 10496824 |
0.2622 | 0.1094 | 200 | 1.1973 | 10767928 |
0.1414 | 0.1122 | 205 | 1.2021 | 11032952 |
0.16 | 0.1149 | 210 | 1.1940 | 11305872 |
0.1348 | 0.1176 | 215 | 1.1937 | 11576432 |
0.1244 | 0.1204 | 220 | 1.1973 | 11851016 |
0.1393 | 0.1231 | 225 | 1.1868 | 12122248 |
0.1372 | 0.1258 | 230 | 1.1894 | 12391544 |
0.2017 | 0.1286 | 235 | 1.1933 | 12659424 |
0.2157 | 0.1313 | 240 | 1.1891 | 12935120 |
0.1222 | 0.1340 | 245 | 1.1908 | 13209320 |
0.1636 | 0.1368 | 250 | 1.1943 | 13487384 |
0.1724 | 0.1395 | 255 | 1.1907 | 13749784 |
0.1335 | 0.1422 | 260 | 1.1872 | 14021680 |
0.143 | 0.1450 | 265 | 1.1853 | 14289216 |
0.0801 | 0.1477 | 270 | 1.1805 | 14557664 |
0.1594 | 0.1504 | 275 | 1.1833 | 14821848 |
0.096 | 0.1532 | 280 | 1.1829 | 15088088 |
0.1399 | 0.1559 | 285 | 1.1820 | 15362192 |
0.1576 | 0.1587 | 290 | 1.1823 | 15635952 |
0.1494 | 0.1614 | 295 | 1.1823 | 15903016 |
0.0803 | 0.1641 | 300 | 1.1824 | 16170560 |
0.1924 | 0.1669 | 305 | 1.1808 | 16433512 |
0.153 | 0.1696 | 310 | 1.1789 | 16703256 |
0.1633 | 0.1723 | 315 | 1.1771 | 16969712 |
0.2378 | 0.1751 | 320 | 1.1752 | 17232240 |
0.1494 | 0.1778 | 325 | 1.1740 | 17497112 |
0.1492 | 0.1805 | 330 | 1.1671 | 17756264 |
0.1529 | 0.1833 | 335 | 1.1758 | 18025664 |
0.102 | 0.1860 | 340 | 1.1801 | 18289112 |
0.1487 | 0.1887 | 345 | 1.1672 | 18557120 |
0.1233 | 0.1915 | 350 | 1.1746 | 18822168 |
0.1628 | 0.1942 | 355 | 1.1735 | 19093248 |
0.2002 | 0.1970 | 360 | 1.1701 | 19364880 |
0.1004 | 0.1997 | 365 | 1.1704 | 19622184 |
0.1483 | 0.2024 | 370 | 1.1693 | 19896912 |
0.1489 | 0.2052 | 375 | 1.1689 | 20167288 |
0.1008 | 0.2079 | 380 | 1.1702 | 20430640 |
0.1688 | 0.2106 | 385 | 1.1733 | 20691032 |
0.1465 | 0.2134 | 390 | 1.1709 | 20966304 |
0.1459 | 0.2161 | 395 | 1.1658 | 21240608 |
0.0888 | 0.2188 | 400 | 1.1679 | 21510808 |
0.0849 | 0.2216 | 405 | 1.1704 | 21780056 |
0.1339 | 0.2243 | 410 | 1.1663 | 22048176 |
0.1438 | 0.2270 | 415 | 1.1653 | 22318760 |
0.1227 | 0.2298 | 420 | 1.1686 | 22588136 |
0.1704 | 0.2325 | 425 | 1.1675 | 22858288 |
0.1241 | 0.2352 | 430 | 1.1626 | 23127464 |
0.0752 | 0.2380 | 435 | 1.1648 | 23395536 |
0.102 | 0.2407 | 440 | 1.1658 | 23663200 |
0.1354 | 0.2435 | 445 | 1.1626 | 23931888 |
0.1178 | 0.2462 | 450 | 1.1594 | 24207088 |
0.1556 | 0.2489 | 455 | 1.1623 | 24477104 |
0.1249 | 0.2517 | 460 | 1.1635 | 24748184 |
0.135 | 0.2544 | 465 | 1.1580 | 25015840 |
0.0997 | 0.2571 | 470 | 1.1613 | 25274800 |
0.183 | 0.2599 | 475 | 1.1565 | 25538784 |
0.1524 | 0.2626 | 480 | 1.1608 | 25811528 |
0.1299 | 0.2653 | 485 | 1.1596 | 26081360 |
0.1025 | 0.2681 | 490 | 1.1542 | 26348112 |
0.1945 | 0.2708 | 495 | 1.1577 | 26607088 |
0.1138 | 0.2735 | 500 | 1.1553 | 26879376 |
0.1872 | 0.2763 | 505 | 1.1565 | 27151544 |
0.1329 | 0.2790 | 510 | 1.1523 | 27416848 |
0.1458 | 0.2817 | 515 | 1.1539 | 27684560 |
0.1452 | 0.2845 | 520 | 1.1551 | 27954048 |
0.1423 | 0.2872 | 525 | 1.1534 | 28220112 |
0.0822 | 0.2900 | 530 | 1.1515 | 28494712 |
0.1274 | 0.2927 | 535 | 1.1528 | 28761360 |
0.1398 | 0.2954 | 540 | 1.1513 | 29029520 |
0.1851 | 0.2982 | 545 | 1.1519 | 29301128 |
0.1634 | 0.3009 | 550 | 1.1573 | 29575880 |
0.0755 | 0.3036 | 555 | 1.1507 | 29842672 |
0.1418 | 0.3064 | 560 | 1.1509 | 30111712 |
0.1712 | 0.3091 | 565 | 1.1518 | 30379128 |
0.1914 | 0.3118 | 570 | 1.1454 | 30647232 |
0.1385 | 0.3146 | 575 | 1.1479 | 30905808 |
0.1112 | 0.3173 | 580 | 1.1553 | 31173776 |
0.114 | 0.3200 | 585 | 1.1492 | 31445496 |
0.1453 | 0.3228 | 590 | 1.1466 | 31710376 |
0.1367 | 0.3255 | 595 | 1.1467 | 31983840 |
0.1817 | 0.3283 | 600 | 1.1466 | 32260416 |
0.0955 | 0.3310 | 605 | 1.1466 | 32524816 |
0.1501 | 0.3337 | 610 | 1.1478 | 32796840 |
0.1512 | 0.3365 | 615 | 1.1445 | 33060472 |
0.1882 | 0.3392 | 620 | 1.1483 | 33327176 |
0.086 | 0.3419 | 625 | 1.1461 | 33596136 |
0.0982 | 0.3447 | 630 | 1.1438 | 33864384 |
0.1159 | 0.3474 | 635 | 1.1442 | 34132792 |
0.1253 | 0.3501 | 640 | 1.1473 | 34409424 |
0.1011 | 0.3529 | 645 | 1.1501 | 34679192 |
0.0758 | 0.3556 | 650 | 1.1444 | 34951400 |
0.1218 | 0.3583 | 655 | 1.1435 | 35227760 |
0.1123 | 0.3611 | 660 | 1.1459 | 35502112 |
0.1567 | 0.3638 | 665 | 1.1445 | 35771360 |
0.1027 | 0.3665 | 670 | 1.1425 | 36035184 |
0.1843 | 0.3693 | 675 | 1.1420 | 36301448 |
0.1262 | 0.3720 | 680 | 1.1451 | 36569816 |
0.2147 | 0.3748 | 685 | 1.1414 | 36840840 |
0.1026 | 0.3775 | 690 | 1.1389 | 37109896 |
0.148 | 0.3802 | 695 | 1.1421 | 37380912 |
0.1004 | 0.3830 | 700 | 1.1412 | 37651504 |
0.1456 | 0.3857 | 705 | 1.1399 | 37915944 |
0.1328 | 0.3884 | 710 | 1.1408 | 38187424 |
0.1516 | 0.3912 | 715 | 1.1409 | 38452472 |
0.1423 | 0.3939 | 720 | 1.1401 | 38725544 |
0.0693 | 0.3966 | 725 | 1.1411 | 39002936 |
0.145 | 0.3994 | 730 | 1.1369 | 39283776 |
0.0908 | 0.4021 | 735 | 1.1383 | 39556048 |
0.1767 | 0.4048 | 740 | 1.1386 | 39827376 |
0.0885 | 0.4076 | 745 | 1.1348 | 40095824 |
0.1309 | 0.4103 | 750 | 1.1362 | 40365560 |
0.1427 | 0.4130 | 755 | 1.1353 | 40639216 |
0.1133 | 0.4158 | 760 | 1.1353 | 40905800 |
0.0964 | 0.4185 | 765 | 1.1383 | 41184040 |
0.1247 | 0.4213 | 770 | 1.1381 | 41449464 |
0.1036 | 0.4240 | 775 | 1.1375 | 41718976 |
0.0943 | 0.4267 | 780 | 1.1362 | 41988192 |
0.1213 | 0.4295 | 785 | 1.1392 | 42252704 |
0.0691 | 0.4322 | 790 | 1.1395 | 42515528 |
0.0992 | 0.4349 | 795 | 1.1395 | 42778600 |
0.1536 | 0.4377 | 800 | 1.1385 | 43050368 |
0.1415 | 0.4404 | 805 | 1.1359 | 43317456 |
0.107 | 0.4431 | 810 | 1.1363 | 43579336 |
0.0727 | 0.4459 | 815 | 1.1364 | 43851016 |
0.1084 | 0.4486 | 820 | 1.1350 | 44119640 |
0.1327 | 0.4513 | 825 | 1.1331 | 44388024 |
0.1569 | 0.4541 | 830 | 1.1323 | 44658024 |
0.0889 | 0.4568 | 835 | 1.1370 | 44926688 |
0.1194 | 0.4596 | 840 | 1.1365 | 45197880 |
0.1314 | 0.4623 | 845 | 1.1336 | 45473008 |
0.0597 | 0.4650 | 850 | 1.1368 | 45745296 |
0.055 | 0.4678 | 855 | 1.1364 | 46015736 |
0.1205 | 0.4705 | 860 | 1.1365 | 46293592 |
0.1283 | 0.4732 | 865 | 1.1349 | 46568376 |
0.1195 | 0.4760 | 870 | 1.1309 | 46840344 |
0.1391 | 0.4787 | 875 | 1.1355 | 47110488 |
0.1324 | 0.4814 | 880 | 1.1324 | 47377704 |
0.1026 | 0.4842 | 885 | 1.1300 | 47635920 |
0.1301 | 0.4869 | 890 | 1.1327 | 47908560 |
0.124 | 0.4896 | 895 | 1.1341 | 48177136 |
0.1325 | 0.4924 | 900 | 1.1297 | 48446816 |
0.1446 | 0.4951 | 905 | 1.1292 | 48714528 |
0.1525 | 0.4978 | 910 | 1.1325 | 48984416 |
0.1612 | 0.5006 | 915 | 1.1309 | 49255328 |
0.1277 | 0.5033 | 920 | 1.1285 | 49522720 |
0.141 | 0.5061 | 925 | 1.1295 | 49794632 |
0.1233 | 0.5088 | 930 | 1.1309 | 50059784 |
0.0937 | 0.5115 | 935 | 1.1296 | 50334856 |
0.1243 | 0.5143 | 940 | 1.1282 | 50606240 |
0.1368 | 0.5170 | 945 | 1.1296 | 50873896 |
0.1006 | 0.5197 | 950 | 1.1287 | 51146576 |
0.0868 | 0.5225 | 955 | 1.1274 | 51419272 |
0.1008 | 0.5252 | 960 | 1.1267 | 51691160 |
0.1 | 0.5279 | 965 | 1.1308 | 51958064 |
0.0645 | 0.5307 | 970 | 1.1296 | 52226760 |
0.0955 | 0.5334 | 975 | 1.1296 | 52497824 |
0.13 | 0.5361 | 980 | 1.1304 | 52770296 |
0.1249 | 0.5389 | 985 | 1.1275 | 53040968 |
0.1615 | 0.5416 | 990 | 1.1263 | 53307232 |
0.0752 | 0.5443 | 995 | 1.1275 | 53579368 |
0.131 | 0.5471 | 1000 | 1.1288 | 53851264 |
0.0688 | 0.5498 | 1005 | 1.1296 | 54118360 |
0.1028 | 0.5526 | 1010 | 1.1283 | 54388792 |
0.1286 | 0.5553 | 1015 | 1.1263 | 54649048 |
0.1078 | 0.5580 | 1020 | 1.1243 | 54911104 |
0.1046 | 0.5608 | 1025 | 1.1260 | 55177560 |
0.1033 | 0.5635 | 1030 | 1.1274 | 55450744 |
0.1 | 0.5662 | 1035 | 1.1249 | 55720440 |
0.0748 | 0.5690 | 1040 | 1.1250 | 55995040 |
0.1346 | 0.5717 | 1045 | 1.1253 | 56272760 |
0.1358 | 0.5744 | 1050 | 1.1248 | 56548912 |
0.1207 | 0.5772 | 1055 | 1.1265 | 56816872 |
0.1846 | 0.5799 | 1060 | 1.1280 | 57079520 |
0.1064 | 0.5826 | 1065 | 1.1259 | 57346080 |
0.0912 | 0.5854 | 1070 | 1.1232 | 57613104 |
0.0811 | 0.5881 | 1075 | 1.1243 | 57877976 |
0.1331 | 0.5909 | 1080 | 1.1235 | 58157936 |
0.0908 | 0.5936 | 1085 | 1.1234 | 58427112 |
0.1493 | 0.5963 | 1090 | 1.1230 | 58696168 |
0.0947 | 0.5991 | 1095 | 1.1224 | 58964768 |
0.0883 | 0.6018 | 1100 | 1.1225 | 59233552 |
0.1224 | 0.6045 | 1105 | 1.1237 | 59508416 |
0.0844 | 0.6073 | 1110 | 1.1243 | 59780056 |
0.1231 | 0.6100 | 1115 | 1.1219 | 60053512 |
0.0704 | 0.6127 | 1120 | 1.1228 | 60323992 |
0.1217 | 0.6155 | 1125 | 1.1247 | 60591480 |
0.1333 | 0.6182 | 1130 | 1.1247 | 60860808 |
0.1773 | 0.6209 | 1135 | 1.1233 | 61129280 |
0.0739 | 0.6237 | 1140 | 1.1230 | 61396264 |
0.1076 | 0.6264 | 1145 | 1.1237 | 61676200 |
0.1018 | 0.6291 | 1150 | 1.1227 | 61939504 |
0.0889 | 0.6319 | 1155 | 1.1217 | 62206840 |
0.0848 | 0.6346 | 1160 | 1.1220 | 62479304 |
0.1288 | 0.6374 | 1165 | 1.1210 | 62744128 |
0.1336 | 0.6401 | 1170 | 1.1189 | 63009680 |
0.1311 | 0.6428 | 1175 | 1.1220 | 63283488 |
0.0721 | 0.6456 | 1180 | 1.1224 | 63551544 |
0.0833 | 0.6483 | 1185 | 1.1202 | 63828120 |
0.159 | 0.6510 | 1190 | 1.1216 | 64097744 |
0.1364 | 0.6538 | 1195 | 1.1222 | 64365456 |
0.122 | 0.6565 | 1200 | 1.1217 | 64639512 |
0.0556 | 0.6592 | 1205 | 1.1209 | 64917008 |
0.0958 | 0.6620 | 1210 | 1.1224 | 65186600 |
0.1396 | 0.6647 | 1215 | 1.1221 | 65457240 |
0.1406 | 0.6674 | 1220 | 1.1219 | 65724384 |
0.183 | 0.6702 | 1225 | 1.1201 | 65991472 |
0.1442 | 0.6729 | 1230 | 1.1202 | 66247368 |
0.1432 | 0.6756 | 1235 | 1.1193 | 66519352 |
0.1047 | 0.6784 | 1240 | 1.1177 | 66793464 |
0.1665 | 0.6811 | 1245 | 1.1211 | 67057776 |
0.1044 | 0.6839 | 1250 | 1.1204 | 67336200 |
0.1104 | 0.6866 | 1255 | 1.1177 | 67599384 |
0.1749 | 0.6893 | 1260 | 1.1179 | 67867712 |
0.0799 | 0.6921 | 1265 | 1.1200 | 68128816 |
0.0796 | 0.6948 | 1270 | 1.1188 | 68394968 |
0.1661 | 0.6975 | 1275 | 1.1176 | 68670640 |
0.0956 | 0.7003 | 1280 | 1.1166 | 68940800 |
0.1501 | 0.7030 | 1285 | 1.1181 | 69209672 |
0.1093 | 0.7057 | 1290 | 1.1189 | 69479976 |
0.0632 | 0.7085 | 1295 | 1.1171 | 69747392 |
0.1077 | 0.7112 | 1300 | 1.1172 | 70015144 |
0.0881 | 0.7139 | 1305 | 1.1179 | 70285856 |
0.0972 | 0.7167 | 1310 | 1.1200 | 70554584 |
0.107 | 0.7194 | 1315 | 1.1186 | 70823800 |
0.1226 | 0.7222 | 1320 | 1.1184 | 71086808 |
0.1196 | 0.7249 | 1325 | 1.1198 | 71361488 |
0.088 | 0.7276 | 1330 | 1.1188 | 71640264 |
0.102 | 0.7304 | 1335 | 1.1177 | 71912832 |
0.1277 | 0.7331 | 1340 | 1.1172 | 72182800 |
0.1844 | 0.7358 | 1345 | 1.1174 | 72448048 |
0.1159 | 0.7386 | 1350 | 1.1191 | 72715976 |
0.1276 | 0.7413 | 1355 | 1.1187 | 72990168 |
0.0715 | 0.7440 | 1360 | 1.1169 | 73255144 |
0.142 | 0.7468 | 1365 | 1.1169 | 73516344 |
0.158 | 0.7495 | 1370 | 1.1182 | 73794232 |
0.1622 | 0.7522 | 1375 | 1.1167 | 74065848 |
0.1652 | 0.7550 | 1380 | 1.1155 | 74333144 |
0.1542 | 0.7577 | 1385 | 1.1164 | 74599680 |
0.1395 | 0.7604 | 1390 | 1.1150 | 74868184 |
0.0988 | 0.7632 | 1395 | 1.1163 | 75131992 |
0.0901 | 0.7659 | 1400 | 1.1163 | 75396456 |
0.1068 | 0.7687 | 1405 | 1.1165 | 75659952 |
0.1644 | 0.7714 | 1410 | 1.1169 | 75936512 |
0.1194 | 0.7741 | 1415 | 1.1158 | 76201496 |
0.1356 | 0.7769 | 1420 | 1.1134 | 76478872 |
0.1126 | 0.7796 | 1425 | 1.1144 | 76746472 |
0.0919 | 0.7823 | 1430 | 1.1150 | 77017520 |
0.1469 | 0.7851 | 1435 | 1.1155 | 77284784 |
0.1433 | 0.7878 | 1440 | 1.1149 | 77551680 |
0.0743 | 0.7905 | 1445 | 1.1138 | 77821136 |
0.1669 | 0.7933 | 1450 | 1.1142 | 78093368 |
0.1076 | 0.7960 | 1455 | 1.1145 | 78362664 |
0.0973 | 0.7987 | 1460 | 1.1113 | 78635256 |
0.0809 | 0.8015 | 1465 | 1.1130 | 78904024 |
0.0797 | 0.8042 | 1470 | 1.1152 | 79176312 |
0.1293 | 0.8069 | 1475 | 1.1165 | 79444528 |
0.1529 | 0.8097 | 1480 | 1.1143 | 79718240 |
0.1049 | 0.8124 | 1485 | 1.1136 | 79991048 |
0.0838 | 0.8152 | 1490 | 1.1137 | 80259104 |
0.1662 | 0.8179 | 1495 | 1.1144 | 80530856 |
0.1528 | 0.8206 | 1500 | 1.1128 | 80796672 |
0.0886 | 0.8234 | 1505 | 1.1138 | 81055744 |
0.0816 | 0.8261 | 1510 | 1.1160 | 81325704 |
0.1051 | 0.8288 | 1515 | 1.1127 | 81598344 |
0.1674 | 0.8316 | 1520 | 1.1123 | 81866048 |
0.1408 | 0.8343 | 1525 | 1.1113 | 82139456 |
0.1149 | 0.8370 | 1530 | 1.1134 | 82424128 |
0.1034 | 0.8398 | 1535 | 1.1147 | 82693032 |
0.0741 | 0.8425 | 1540 | 1.1131 | 82962848 |
0.1875 | 0.8452 | 1545 | 1.1126 | 83233240 |
0.1677 | 0.8480 | 1550 | 1.1145 | 83504368 |
0.1579 | 0.8507 | 1555 | 1.1140 | 83775296 |
0.13 | 0.8535 | 1560 | 1.1137 | 84048072 |
0.0949 | 0.8562 | 1565 | 1.1117 | 84320136 |
0.1196 | 0.8589 | 1570 | 1.1126 | 84584048 |
0.126 | 0.8617 | 1575 | 1.1116 | 84856504 |
0.0681 | 0.8644 | 1580 | 1.1104 | 85127256 |
0.1589 | 0.8671 | 1585 | 1.1110 | 85391472 |
0.1435 | 0.8699 | 1590 | 1.1120 | 85670424 |
0.1237 | 0.8726 | 1595 | 1.1115 | 85933648 |
0.1618 | 0.8753 | 1600 | 1.1128 | 86205312 |
0.0772 | 0.8781 | 1605 | 1.1142 | 86474080 |
0.138 | 0.8808 | 1610 | 1.1137 | 86747368 |
0.1592 | 0.8835 | 1615 | 1.1122 | 87022320 |
0.1275 | 0.8863 | 1620 | 1.1115 | 87281000 |
0.0641 | 0.8890 | 1625 | 1.1114 | 87544016 |
0.1032 | 0.8917 | 1630 | 1.1112 | 87813640 |
0.1413 | 0.8945 | 1635 | 1.1123 | 88083872 |
0.1243 | 0.8972 | 1640 | 1.1095 | 88363088 |
0.1232 | 0.9000 | 1645 | 1.1105 | 88638376 |
0.1382 | 0.9027 | 1650 | 1.1098 | 88906088 |
0.1171 | 0.9054 | 1655 | 1.1087 | 89179912 |
0.1533 | 0.9082 | 1660 | 1.1117 | 89448520 |
0.1143 | 0.9109 | 1665 | 1.1124 | 89718944 |
0.1055 | 0.9136 | 1670 | 1.1099 | 89987376 |
0.111 | 0.9164 | 1675 | 1.1090 | 90265384 |
0.0757 | 0.9191 | 1680 | 1.1111 | 90531896 |
0.1295 | 0.9218 | 1685 | 1.1122 | 90801128 |
0.1262 | 0.9246 | 1690 | 1.1090 | 91080224 |
0.1637 | 0.9273 | 1695 | 1.1088 | 91346424 |
0.0961 | 0.9300 | 1700 | 1.1094 | 91620224 |
0.1968 | 0.9328 | 1705 | 1.1078 | 91891560 |
0.1257 | 0.9355 | 1710 | 1.1094 | 92164248 |
0.1734 | 0.9382 | 1715 | 1.1109 | 92435768 |
0.1424 | 0.9410 | 1720 | 1.1101 | 92704608 |
0.1874 | 0.9437 | 1725 | 1.1099 | 92983192 |
0.1318 | 0.9465 | 1730 | 1.1084 | 93243792 |
0.1153 | 0.9492 | 1735 | 1.1086 | 93517248 |
0.1283 | 0.9519 | 1740 | 1.1081 | 93791240 |
0.1191 | 0.9547 | 1745 | 1.1075 | 94059424 |
0.1424 | 0.9574 | 1750 | 1.1089 | 94326944 |
0.1458 | 0.9601 | 1755 | 1.1093 | 94594848 |
0.1361 | 0.9629 | 1760 | 1.1081 | 94861000 |
0.1238 | 0.9656 | 1765 | 1.1060 | 95141936 |
0.1341 | 0.9683 | 1770 | 1.1077 | 95409544 |
0.2192 | 0.9711 | 1775 | 1.1079 | 95681696 |
0.1611 | 0.9738 | 1780 | 1.1059 | 95946160 |
0.088 | 0.9765 | 1785 | 1.1061 | 96209080 |
0.1224 | 0.9793 | 1790 | 1.1095 | 96476168 |
0.109 | 0.9820 | 1795 | 1.1088 | 96749632 |
0.0898 | 0.9848 | 1800 | 1.1047 | 97023680 |
0.0734 | 0.9875 | 1805 | 1.1051 | 97298760 |
0.0984 | 0.9902 | 1810 | 1.1070 | 97563008 |
0.1193 | 0.9930 | 1815 | 1.1065 | 97826504 |
0.1278 | 0.9957 | 1820 | 1.1063 | 98091392 |
0.1163 | 0.9984 | 1825 | 1.1074 | 98363120 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter19_sftsd2
Base model
google/gemma-2-2b