collapse_gemma-2-2b_hs2_accumulate_iter19_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0970
- Num Input Tokens Seen: 99323720
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6806 | 0.0027 | 5 | 1.3903 | 284224 |
1.7742 | 0.0055 | 10 | 1.3839 | 558216 |
1.6345 | 0.0082 | 15 | 1.3635 | 825488 |
1.6429 | 0.0109 | 20 | 1.3346 | 1098184 |
1.5694 | 0.0137 | 25 | 1.2949 | 1362824 |
1.4208 | 0.0164 | 30 | 1.2550 | 1629968 |
1.361 | 0.0191 | 35 | 1.2330 | 1899384 |
1.321 | 0.0219 | 40 | 1.2076 | 2168232 |
1.2729 | 0.0246 | 45 | 1.1994 | 2444472 |
1.0367 | 0.0273 | 50 | 1.2203 | 2720200 |
0.9223 | 0.0301 | 55 | 1.2427 | 3000168 |
0.7826 | 0.0328 | 60 | 1.2884 | 3266176 |
0.6697 | 0.0355 | 65 | 1.3139 | 3531600 |
0.5236 | 0.0383 | 70 | 1.3491 | 3804176 |
0.5492 | 0.0410 | 75 | 1.3487 | 4077408 |
0.4101 | 0.0437 | 80 | 1.3692 | 4354936 |
0.4036 | 0.0465 | 85 | 1.3093 | 4629656 |
0.3022 | 0.0492 | 90 | 1.2783 | 4895720 |
0.3021 | 0.0519 | 95 | 1.2753 | 5159064 |
0.2682 | 0.0547 | 100 | 1.2381 | 5431472 |
0.1918 | 0.0574 | 105 | 1.2649 | 5704224 |
0.2585 | 0.0601 | 110 | 1.2237 | 5973536 |
0.2467 | 0.0629 | 115 | 1.2389 | 6236216 |
0.2346 | 0.0656 | 120 | 1.2144 | 6515656 |
0.2006 | 0.0683 | 125 | 1.2188 | 6787920 |
0.1831 | 0.0711 | 130 | 1.2170 | 7069464 |
0.1946 | 0.0738 | 135 | 1.2247 | 7339480 |
0.1731 | 0.0765 | 140 | 1.2124 | 7613304 |
0.1714 | 0.0793 | 145 | 1.2085 | 7887024 |
0.2256 | 0.0820 | 150 | 1.2029 | 8160568 |
0.1567 | 0.0847 | 155 | 1.2028 | 8429984 |
0.1887 | 0.0875 | 160 | 1.2032 | 8688816 |
0.2477 | 0.0902 | 165 | 1.2059 | 8955672 |
0.1097 | 0.0929 | 170 | 1.1974 | 9226992 |
0.1214 | 0.0957 | 175 | 1.1946 | 9497088 |
0.211 | 0.0984 | 180 | 1.1902 | 9766280 |
0.1582 | 0.1011 | 185 | 1.1887 | 10028512 |
0.172 | 0.1039 | 190 | 1.1875 | 10301240 |
0.1915 | 0.1066 | 195 | 1.1814 | 10569376 |
0.1859 | 0.1093 | 200 | 1.1856 | 10839888 |
0.1392 | 0.1121 | 205 | 1.1804 | 11113656 |
0.186 | 0.1148 | 210 | 1.1752 | 11388048 |
0.2317 | 0.1175 | 215 | 1.1797 | 11656800 |
0.1651 | 0.1203 | 220 | 1.1790 | 11928832 |
0.152 | 0.1230 | 225 | 1.1693 | 12197776 |
0.1357 | 0.1257 | 230 | 1.1733 | 12457064 |
0.1073 | 0.1285 | 235 | 1.1760 | 12722408 |
0.1327 | 0.1312 | 240 | 1.1739 | 12989168 |
0.1879 | 0.1339 | 245 | 1.1765 | 13256632 |
0.0982 | 0.1366 | 250 | 1.1713 | 13523920 |
0.1909 | 0.1394 | 255 | 1.1673 | 13793720 |
0.168 | 0.1421 | 260 | 1.1682 | 14064608 |
0.1839 | 0.1448 | 265 | 1.1659 | 14341656 |
0.1328 | 0.1476 | 270 | 1.1651 | 14620592 |
0.1309 | 0.1503 | 275 | 1.1631 | 14889240 |
0.119 | 0.1530 | 280 | 1.1640 | 15156896 |
0.1204 | 0.1558 | 285 | 1.1634 | 15424144 |
0.0846 | 0.1585 | 290 | 1.1616 | 15697056 |
0.1399 | 0.1612 | 295 | 1.1688 | 15966720 |
0.2202 | 0.1640 | 300 | 1.1599 | 16243944 |
0.1488 | 0.1667 | 305 | 1.1592 | 16517880 |
0.1544 | 0.1694 | 310 | 1.1606 | 16783704 |
0.1309 | 0.1722 | 315 | 1.1595 | 17049872 |
0.1789 | 0.1749 | 320 | 1.1562 | 17325872 |
0.0821 | 0.1776 | 325 | 1.1558 | 17600728 |
0.1737 | 0.1804 | 330 | 1.1585 | 17873864 |
0.1194 | 0.1831 | 335 | 1.1581 | 18143992 |
0.1852 | 0.1858 | 340 | 1.1574 | 18415128 |
0.1555 | 0.1886 | 345 | 1.1582 | 18683952 |
0.1734 | 0.1913 | 350 | 1.1601 | 18956880 |
0.0801 | 0.1940 | 355 | 1.1545 | 19228184 |
0.1157 | 0.1968 | 360 | 1.1575 | 19501984 |
0.187 | 0.1995 | 365 | 1.1557 | 19778776 |
0.1558 | 0.2022 | 370 | 1.1525 | 20047712 |
0.1624 | 0.2050 | 375 | 1.1518 | 20324752 |
0.1013 | 0.2077 | 380 | 1.1544 | 20596176 |
0.1261 | 0.2104 | 385 | 1.1527 | 20865504 |
0.1974 | 0.2132 | 390 | 1.1497 | 21134096 |
0.0918 | 0.2159 | 395 | 1.1520 | 21411408 |
0.0961 | 0.2186 | 400 | 1.1496 | 21679776 |
0.0914 | 0.2214 | 405 | 1.1482 | 21948928 |
0.161 | 0.2241 | 410 | 1.1473 | 22218896 |
0.0996 | 0.2268 | 415 | 1.1467 | 22487448 |
0.0868 | 0.2296 | 420 | 1.1502 | 22753056 |
0.1084 | 0.2323 | 425 | 1.1463 | 23021752 |
0.1158 | 0.2350 | 430 | 1.1460 | 23285432 |
0.1514 | 0.2378 | 435 | 1.1500 | 23560376 |
0.1295 | 0.2405 | 440 | 1.1488 | 23833312 |
0.1886 | 0.2432 | 445 | 1.1458 | 24106832 |
0.1567 | 0.2460 | 450 | 1.1430 | 24377824 |
0.1654 | 0.2487 | 455 | 1.1447 | 24651848 |
0.142 | 0.2514 | 460 | 1.1439 | 24921904 |
0.0862 | 0.2542 | 465 | 1.1449 | 25194840 |
0.1242 | 0.2569 | 470 | 1.1491 | 25465280 |
0.1765 | 0.2596 | 475 | 1.1439 | 25733064 |
0.1443 | 0.2624 | 480 | 1.1403 | 26005904 |
0.1769 | 0.2651 | 485 | 1.1433 | 26275104 |
0.1434 | 0.2678 | 490 | 1.1458 | 26546208 |
0.1271 | 0.2706 | 495 | 1.1413 | 26820696 |
0.1386 | 0.2733 | 500 | 1.1418 | 27090816 |
0.0905 | 0.2760 | 505 | 1.1414 | 27366352 |
0.1208 | 0.2788 | 510 | 1.1432 | 27641392 |
0.0852 | 0.2815 | 515 | 1.1392 | 27911280 |
0.1129 | 0.2842 | 520 | 1.1432 | 28187640 |
0.1168 | 0.2870 | 525 | 1.1451 | 28468920 |
0.1317 | 0.2897 | 530 | 1.1384 | 28739904 |
0.1506 | 0.2924 | 535 | 1.1368 | 29012856 |
0.1286 | 0.2952 | 540 | 1.1386 | 29287216 |
0.105 | 0.2979 | 545 | 1.1401 | 29561832 |
0.0946 | 0.3006 | 550 | 1.1360 | 29834776 |
0.0904 | 0.3034 | 555 | 1.1380 | 30113000 |
0.1205 | 0.3061 | 560 | 1.1390 | 30388320 |
0.163 | 0.3088 | 565 | 1.1373 | 30657400 |
0.1169 | 0.3116 | 570 | 1.1349 | 30934392 |
0.1544 | 0.3143 | 575 | 1.1373 | 31209744 |
0.1657 | 0.3170 | 580 | 1.1382 | 31485168 |
0.0994 | 0.3198 | 585 | 1.1346 | 31755520 |
0.1924 | 0.3225 | 590 | 1.1331 | 32028760 |
0.0992 | 0.3252 | 595 | 1.1353 | 32298744 |
0.1465 | 0.3280 | 600 | 1.1335 | 32565768 |
0.163 | 0.3307 | 605 | 1.1320 | 32831128 |
0.1784 | 0.3334 | 610 | 1.1362 | 33104192 |
0.1578 | 0.3362 | 615 | 1.1364 | 33371160 |
0.1466 | 0.3389 | 620 | 1.1304 | 33643032 |
0.1322 | 0.3416 | 625 | 1.1284 | 33912272 |
0.1916 | 0.3444 | 630 | 1.1332 | 34189856 |
0.1161 | 0.3471 | 635 | 1.1325 | 34465248 |
0.1703 | 0.3498 | 640 | 1.1294 | 34737192 |
0.1532 | 0.3526 | 645 | 1.1309 | 35007224 |
0.0963 | 0.3553 | 650 | 1.1332 | 35276816 |
0.1691 | 0.3580 | 655 | 1.1314 | 35545624 |
0.1953 | 0.3608 | 660 | 1.1283 | 35812824 |
0.0758 | 0.3635 | 665 | 1.1288 | 36088744 |
0.1302 | 0.3662 | 670 | 1.1289 | 36364208 |
0.1075 | 0.3690 | 675 | 1.1281 | 36639360 |
0.1476 | 0.3717 | 680 | 1.1285 | 36909888 |
0.1481 | 0.3744 | 685 | 1.1322 | 37186216 |
0.1975 | 0.3772 | 690 | 1.1313 | 37455192 |
0.1224 | 0.3799 | 695 | 1.1269 | 37727416 |
0.1354 | 0.3826 | 700 | 1.1292 | 38004976 |
0.1068 | 0.3854 | 705 | 1.1299 | 38275720 |
0.1196 | 0.3881 | 710 | 1.1273 | 38547192 |
0.1703 | 0.3908 | 715 | 1.1268 | 38821296 |
0.1107 | 0.3936 | 720 | 1.1250 | 39095736 |
0.1168 | 0.3963 | 725 | 1.1259 | 39366480 |
0.0984 | 0.3990 | 730 | 1.1302 | 39635736 |
0.1674 | 0.4017 | 735 | 1.1312 | 39906536 |
0.1025 | 0.4045 | 740 | 1.1270 | 40176720 |
0.0845 | 0.4072 | 745 | 1.1259 | 40447656 |
0.1321 | 0.4099 | 750 | 1.1264 | 40718536 |
0.1485 | 0.4127 | 755 | 1.1270 | 40989832 |
0.1375 | 0.4154 | 760 | 1.1279 | 41259888 |
0.0981 | 0.4181 | 765 | 1.1247 | 41526248 |
0.131 | 0.4209 | 770 | 1.1239 | 41788168 |
0.1015 | 0.4236 | 775 | 1.1243 | 42063792 |
0.0936 | 0.4263 | 780 | 1.1277 | 42333600 |
0.1605 | 0.4291 | 785 | 1.1237 | 42605512 |
0.106 | 0.4318 | 790 | 1.1215 | 42886072 |
0.1686 | 0.4345 | 795 | 1.1214 | 43151880 |
0.1057 | 0.4373 | 800 | 1.1224 | 43418768 |
0.1102 | 0.4400 | 805 | 1.1228 | 43692496 |
0.1525 | 0.4427 | 810 | 1.1252 | 43963272 |
0.1133 | 0.4455 | 815 | 1.1233 | 44234056 |
0.099 | 0.4482 | 820 | 1.1225 | 44507992 |
0.0989 | 0.4509 | 825 | 1.1246 | 44783280 |
0.1051 | 0.4537 | 830 | 1.1248 | 45056384 |
0.1048 | 0.4564 | 835 | 1.1231 | 45327640 |
0.1623 | 0.4591 | 840 | 1.1211 | 45603080 |
0.1041 | 0.4619 | 845 | 1.1216 | 45878640 |
0.11 | 0.4646 | 850 | 1.1213 | 46150632 |
0.087 | 0.4673 | 855 | 1.1211 | 46421888 |
0.1189 | 0.4701 | 860 | 1.1219 | 46696968 |
0.0921 | 0.4728 | 865 | 1.1224 | 46975480 |
0.1594 | 0.4755 | 870 | 1.1204 | 47252720 |
0.1065 | 0.4783 | 875 | 1.1213 | 47524408 |
0.1697 | 0.4810 | 880 | 1.1207 | 47795896 |
0.12 | 0.4837 | 885 | 1.1191 | 48068736 |
0.1062 | 0.4865 | 890 | 1.1196 | 48338920 |
0.0921 | 0.4892 | 895 | 1.1215 | 48602256 |
0.1828 | 0.4919 | 900 | 1.1193 | 48876312 |
0.1058 | 0.4947 | 905 | 1.1177 | 49153904 |
0.0609 | 0.4974 | 910 | 1.1194 | 49420248 |
0.1143 | 0.5001 | 915 | 1.1215 | 49689184 |
0.1058 | 0.5029 | 920 | 1.1220 | 49955584 |
0.0983 | 0.5056 | 925 | 1.1182 | 50232384 |
0.0694 | 0.5083 | 930 | 1.1175 | 50508656 |
0.0878 | 0.5111 | 935 | 1.1175 | 50783144 |
0.0897 | 0.5138 | 940 | 1.1156 | 51054808 |
0.1236 | 0.5165 | 945 | 1.1169 | 51321728 |
0.0832 | 0.5193 | 950 | 1.1183 | 51599160 |
0.1172 | 0.5220 | 955 | 1.1195 | 51868016 |
0.1299 | 0.5247 | 960 | 1.1178 | 52137240 |
0.0812 | 0.5275 | 965 | 1.1152 | 52413584 |
0.1477 | 0.5302 | 970 | 1.1160 | 52680720 |
0.0942 | 0.5329 | 975 | 1.1177 | 52955320 |
0.1192 | 0.5357 | 980 | 1.1157 | 53224360 |
0.0978 | 0.5384 | 985 | 1.1143 | 53500504 |
0.1351 | 0.5411 | 990 | 1.1152 | 53763400 |
0.1338 | 0.5439 | 995 | 1.1161 | 54034432 |
0.1006 | 0.5466 | 1000 | 1.1160 | 54303536 |
0.0671 | 0.5493 | 1005 | 1.1141 | 54569320 |
0.0844 | 0.5521 | 1010 | 1.1139 | 54842808 |
0.0774 | 0.5548 | 1015 | 1.1159 | 55118368 |
0.0987 | 0.5575 | 1020 | 1.1158 | 55388808 |
0.153 | 0.5603 | 1025 | 1.1154 | 55668632 |
0.102 | 0.5630 | 1030 | 1.1170 | 55937848 |
0.1187 | 0.5657 | 1035 | 1.1155 | 56203848 |
0.0912 | 0.5685 | 1040 | 1.1147 | 56479760 |
0.0758 | 0.5712 | 1045 | 1.1165 | 56749200 |
0.1541 | 0.5739 | 1050 | 1.1148 | 57020200 |
0.1152 | 0.5767 | 1055 | 1.1121 | 57285080 |
0.1403 | 0.5794 | 1060 | 1.1134 | 57548280 |
0.1048 | 0.5821 | 1065 | 1.1133 | 57826080 |
0.0905 | 0.5849 | 1070 | 1.1120 | 58093712 |
0.127 | 0.5876 | 1075 | 1.1114 | 58364480 |
0.1293 | 0.5903 | 1080 | 1.1141 | 58629272 |
0.1131 | 0.5931 | 1085 | 1.1141 | 58898040 |
0.0855 | 0.5958 | 1090 | 1.1154 | 59170688 |
0.1093 | 0.5985 | 1095 | 1.1154 | 59438904 |
0.0939 | 0.6013 | 1100 | 1.1134 | 59709000 |
0.1225 | 0.6040 | 1105 | 1.1153 | 59983096 |
0.128 | 0.6067 | 1110 | 1.1143 | 60258088 |
0.1065 | 0.6095 | 1115 | 1.1132 | 60526544 |
0.1202 | 0.6122 | 1120 | 1.1136 | 60798992 |
0.194 | 0.6149 | 1125 | 1.1130 | 61070152 |
0.1272 | 0.6177 | 1130 | 1.1130 | 61340768 |
0.1141 | 0.6204 | 1135 | 1.1127 | 61615800 |
0.109 | 0.6231 | 1140 | 1.1117 | 61880432 |
0.1224 | 0.6259 | 1145 | 1.1124 | 62155328 |
0.1201 | 0.6286 | 1150 | 1.1124 | 62427232 |
0.1465 | 0.6313 | 1155 | 1.1114 | 62698008 |
0.0914 | 0.6341 | 1160 | 1.1105 | 62973120 |
0.0932 | 0.6368 | 1165 | 1.1118 | 63239584 |
0.0525 | 0.6395 | 1170 | 1.1129 | 63510000 |
0.0754 | 0.6423 | 1175 | 1.1122 | 63785568 |
0.127 | 0.6450 | 1180 | 1.1133 | 64058008 |
0.1313 | 0.6477 | 1185 | 1.1105 | 64334312 |
0.1512 | 0.6505 | 1190 | 1.1086 | 64603792 |
0.0959 | 0.6532 | 1195 | 1.1098 | 64881232 |
0.0798 | 0.6559 | 1200 | 1.1122 | 65152216 |
0.0752 | 0.6586 | 1205 | 1.1127 | 65426696 |
0.0975 | 0.6614 | 1210 | 1.1113 | 65686664 |
0.0834 | 0.6641 | 1215 | 1.1087 | 65959016 |
0.0785 | 0.6668 | 1220 | 1.1093 | 66232792 |
0.1468 | 0.6696 | 1225 | 1.1092 | 66501224 |
0.1078 | 0.6723 | 1230 | 1.1098 | 66767656 |
0.1119 | 0.6750 | 1235 | 1.1091 | 67032392 |
0.05 | 0.6778 | 1240 | 1.1100 | 67292424 |
0.1021 | 0.6805 | 1245 | 1.1086 | 67561984 |
0.1359 | 0.6832 | 1250 | 1.1091 | 67833584 |
0.1364 | 0.6860 | 1255 | 1.1092 | 68113240 |
0.0851 | 0.6887 | 1260 | 1.1071 | 68383304 |
0.1273 | 0.6914 | 1265 | 1.1072 | 68651440 |
0.1269 | 0.6942 | 1270 | 1.1077 | 68928528 |
0.1332 | 0.6969 | 1275 | 1.1060 | 69207336 |
0.0966 | 0.6996 | 1280 | 1.1053 | 69487616 |
0.1186 | 0.7024 | 1285 | 1.1071 | 69758992 |
0.1416 | 0.7051 | 1290 | 1.1065 | 70032952 |
0.1234 | 0.7078 | 1295 | 1.1058 | 70305032 |
0.1087 | 0.7106 | 1300 | 1.1065 | 70575760 |
0.1219 | 0.7133 | 1305 | 1.1069 | 70845352 |
0.1054 | 0.7160 | 1310 | 1.1074 | 71108984 |
0.1203 | 0.7188 | 1315 | 1.1072 | 71384016 |
0.1303 | 0.7215 | 1320 | 1.1082 | 71652416 |
0.1042 | 0.7242 | 1325 | 1.1093 | 71927208 |
0.06 | 0.7270 | 1330 | 1.1075 | 72203544 |
0.0828 | 0.7297 | 1335 | 1.1073 | 72479272 |
0.1459 | 0.7324 | 1340 | 1.1078 | 72757816 |
0.0883 | 0.7352 | 1345 | 1.1071 | 73025888 |
0.0948 | 0.7379 | 1350 | 1.1067 | 73291272 |
0.1146 | 0.7406 | 1355 | 1.1067 | 73561944 |
0.0853 | 0.7434 | 1360 | 1.1066 | 73839224 |
0.0937 | 0.7461 | 1365 | 1.1059 | 74107504 |
0.093 | 0.7488 | 1370 | 1.1061 | 74372408 |
0.1183 | 0.7516 | 1375 | 1.1060 | 74641384 |
0.177 | 0.7543 | 1380 | 1.1081 | 74914624 |
0.1008 | 0.7570 | 1385 | 1.1050 | 75190120 |
0.0639 | 0.7598 | 1390 | 1.1041 | 75462616 |
0.1602 | 0.7625 | 1395 | 1.1071 | 75731352 |
0.1118 | 0.7652 | 1400 | 1.1087 | 76000888 |
0.0994 | 0.7680 | 1405 | 1.1059 | 76265640 |
0.1123 | 0.7707 | 1410 | 1.1032 | 76540208 |
0.1006 | 0.7734 | 1415 | 1.1033 | 76805672 |
0.1308 | 0.7762 | 1420 | 1.1059 | 77070744 |
0.0908 | 0.7789 | 1425 | 1.1066 | 77348592 |
0.1243 | 0.7816 | 1430 | 1.1051 | 77620720 |
0.0688 | 0.7844 | 1435 | 1.1039 | 77892200 |
0.1306 | 0.7871 | 1440 | 1.1036 | 78162688 |
0.1104 | 0.7898 | 1445 | 1.1052 | 78438264 |
0.1566 | 0.7926 | 1450 | 1.1061 | 78710120 |
0.1533 | 0.7953 | 1455 | 1.1047 | 78981544 |
0.1274 | 0.7980 | 1460 | 1.1022 | 79252016 |
0.0912 | 0.8008 | 1465 | 1.1029 | 79521600 |
0.1074 | 0.8035 | 1470 | 1.1042 | 79797688 |
0.1111 | 0.8062 | 1475 | 1.1035 | 80068680 |
0.1071 | 0.8090 | 1480 | 1.1029 | 80340744 |
0.1493 | 0.8117 | 1485 | 1.1027 | 80613512 |
0.0678 | 0.8144 | 1490 | 1.1034 | 80888200 |
0.0787 | 0.8172 | 1495 | 1.1042 | 81161640 |
0.1314 | 0.8199 | 1500 | 1.1034 | 81437752 |
0.0726 | 0.8226 | 1505 | 1.1031 | 81704688 |
0.1182 | 0.8254 | 1510 | 1.1068 | 81973504 |
0.0969 | 0.8281 | 1515 | 1.1070 | 82248840 |
0.0697 | 0.8308 | 1520 | 1.1034 | 82522968 |
0.0819 | 0.8336 | 1525 | 1.1030 | 82797984 |
0.1532 | 0.8363 | 1530 | 1.1041 | 83072080 |
0.162 | 0.8390 | 1535 | 1.1044 | 83340776 |
0.1278 | 0.8418 | 1540 | 1.1041 | 83616904 |
0.122 | 0.8445 | 1545 | 1.1029 | 83889272 |
0.1666 | 0.8472 | 1550 | 1.1038 | 84161584 |
0.126 | 0.8500 | 1555 | 1.1046 | 84430304 |
0.188 | 0.8527 | 1560 | 1.1030 | 84702208 |
0.1373 | 0.8554 | 1565 | 1.1014 | 84979552 |
0.0774 | 0.8582 | 1570 | 1.1018 | 85245720 |
0.1417 | 0.8609 | 1575 | 1.1026 | 85512000 |
0.1333 | 0.8636 | 1580 | 1.1017 | 85792672 |
0.0951 | 0.8664 | 1585 | 1.1012 | 86068368 |
0.1222 | 0.8691 | 1590 | 1.1005 | 86330288 |
0.1686 | 0.8718 | 1595 | 1.1012 | 86604152 |
0.0758 | 0.8746 | 1600 | 1.1013 | 86867232 |
0.1297 | 0.8773 | 1605 | 1.1004 | 87136744 |
0.1517 | 0.8800 | 1610 | 1.0993 | 87410448 |
0.152 | 0.8828 | 1615 | 1.0995 | 87680312 |
0.0872 | 0.8855 | 1620 | 1.1014 | 87952128 |
0.1061 | 0.8882 | 1625 | 1.1019 | 88225928 |
0.1464 | 0.8910 | 1630 | 1.0986 | 88499456 |
0.0814 | 0.8937 | 1635 | 1.0981 | 88772656 |
0.1532 | 0.8964 | 1640 | 1.1006 | 89041152 |
0.1368 | 0.8992 | 1645 | 1.1019 | 89316136 |
0.1246 | 0.9019 | 1650 | 1.1008 | 89589816 |
0.1003 | 0.9046 | 1655 | 1.0988 | 89858640 |
0.133 | 0.9074 | 1660 | 1.0988 | 90123848 |
0.1467 | 0.9101 | 1665 | 1.1003 | 90394336 |
0.0839 | 0.9128 | 1670 | 1.0998 | 90665632 |
0.1506 | 0.9156 | 1675 | 1.0992 | 90941464 |
0.1321 | 0.9183 | 1680 | 1.0989 | 91215624 |
0.1109 | 0.9210 | 1685 | 1.1008 | 91482504 |
0.1145 | 0.9237 | 1690 | 1.1003 | 91751792 |
0.1013 | 0.9265 | 1695 | 1.0983 | 92024488 |
0.073 | 0.9292 | 1700 | 1.0970 | 92291128 |
0.1401 | 0.9319 | 1705 | 1.0985 | 92561912 |
0.1107 | 0.9347 | 1710 | 1.1009 | 92824936 |
0.057 | 0.9374 | 1715 | 1.1018 | 93099776 |
0.1711 | 0.9401 | 1720 | 1.1012 | 93377208 |
0.1538 | 0.9429 | 1725 | 1.1007 | 93649104 |
0.0917 | 0.9456 | 1730 | 1.1014 | 93921856 |
0.1319 | 0.9483 | 1735 | 1.1005 | 94194688 |
0.072 | 0.9511 | 1740 | 1.0998 | 94468368 |
0.1294 | 0.9538 | 1745 | 1.0999 | 94737480 |
0.1123 | 0.9565 | 1750 | 1.0992 | 95010008 |
0.1311 | 0.9593 | 1755 | 1.0983 | 95289936 |
0.0949 | 0.9620 | 1760 | 1.0981 | 95564760 |
0.0817 | 0.9647 | 1765 | 1.0985 | 95833472 |
0.1138 | 0.9675 | 1770 | 1.1000 | 96101768 |
0.1472 | 0.9702 | 1775 | 1.1003 | 96379520 |
0.0659 | 0.9729 | 1780 | 1.0998 | 96652048 |
0.0787 | 0.9757 | 1785 | 1.0989 | 96924520 |
0.1305 | 0.9784 | 1790 | 1.1001 | 97196088 |
0.1023 | 0.9811 | 1795 | 1.1011 | 97473568 |
0.0772 | 0.9839 | 1800 | 1.0999 | 97742344 |
0.1433 | 0.9866 | 1805 | 1.0983 | 98016040 |
0.0631 | 0.9893 | 1810 | 1.0966 | 98289688 |
0.0797 | 0.9921 | 1815 | 1.0985 | 98561640 |
0.1687 | 0.9948 | 1820 | 1.0997 | 98834208 |
0.1308 | 0.9975 | 1825 | 1.0984 | 99106032 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter19_sftsd0
Base model
google/gemma-2-2b