collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.1029
- Num Input Tokens Seen: 41091672
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.5665 | 0.0066 | 5 | 1.3873 | 272560 |
1.5456 | 0.0132 | 10 | 1.3529 | 547080 |
1.4344 | 0.0198 | 15 | 1.2836 | 822880 |
1.4512 | 0.0264 | 20 | 1.2345 | 1089864 |
1.3462 | 0.0330 | 25 | 1.1901 | 1361576 |
1.1712 | 0.0396 | 30 | 1.1835 | 1634504 |
1.0826 | 0.0462 | 35 | 1.1964 | 1895936 |
0.9291 | 0.0527 | 40 | 1.1914 | 2166120 |
0.8296 | 0.0593 | 45 | 1.2208 | 2435904 |
0.6654 | 0.0659 | 50 | 1.2499 | 2706240 |
0.6401 | 0.0725 | 55 | 1.2356 | 2984976 |
0.6449 | 0.0791 | 60 | 1.2089 | 3257728 |
0.5585 | 0.0857 | 65 | 1.2026 | 3526976 |
0.468 | 0.0923 | 70 | 1.2120 | 3804888 |
0.5271 | 0.0989 | 75 | 1.2040 | 4078544 |
0.3901 | 0.1055 | 80 | 1.1976 | 4356048 |
0.4389 | 0.1121 | 85 | 1.2049 | 4621624 |
0.3482 | 0.1187 | 90 | 1.1972 | 4888632 |
0.3224 | 0.1253 | 95 | 1.1926 | 5152168 |
0.4305 | 0.1319 | 100 | 1.1944 | 5423968 |
0.3758 | 0.1385 | 105 | 1.1825 | 5697240 |
0.3646 | 0.1450 | 110 | 1.1919 | 5971384 |
0.3215 | 0.1516 | 115 | 1.1776 | 6240360 |
0.3273 | 0.1582 | 120 | 1.1907 | 6509288 |
0.3152 | 0.1648 | 125 | 1.1786 | 6779048 |
0.2365 | 0.1714 | 130 | 1.1833 | 7048200 |
0.3342 | 0.1780 | 135 | 1.1750 | 7316656 |
0.3586 | 0.1846 | 140 | 1.1774 | 7590728 |
0.2927 | 0.1912 | 145 | 1.1737 | 7859680 |
0.3788 | 0.1978 | 150 | 1.1760 | 8126224 |
0.2964 | 0.2044 | 155 | 1.1741 | 8403808 |
0.2938 | 0.2110 | 160 | 1.1677 | 8672216 |
0.2518 | 0.2176 | 165 | 1.1735 | 8946264 |
0.3334 | 0.2242 | 170 | 1.1647 | 9208352 |
0.311 | 0.2308 | 175 | 1.1647 | 9477208 |
0.3065 | 0.2373 | 180 | 1.1620 | 9748024 |
0.2517 | 0.2439 | 185 | 1.1613 | 10021768 |
0.2672 | 0.2505 | 190 | 1.1569 | 10293208 |
0.2611 | 0.2571 | 195 | 1.1545 | 10569280 |
0.2265 | 0.2637 | 200 | 1.1548 | 10840984 |
0.3068 | 0.2703 | 205 | 1.1520 | 11116568 |
0.2929 | 0.2769 | 210 | 1.1568 | 11394928 |
0.3351 | 0.2835 | 215 | 1.1547 | 11666600 |
0.2687 | 0.2901 | 220 | 1.1544 | 11946656 |
0.2501 | 0.2967 | 225 | 1.1479 | 12224240 |
0.1991 | 0.3033 | 230 | 1.1520 | 12500672 |
0.2434 | 0.3099 | 235 | 1.1477 | 12767840 |
0.1667 | 0.3165 | 240 | 1.1453 | 13035688 |
0.2564 | 0.3231 | 245 | 1.1509 | 13312232 |
0.2856 | 0.3297 | 250 | 1.1436 | 13584328 |
0.305 | 0.3362 | 255 | 1.1425 | 13853288 |
0.2765 | 0.3428 | 260 | 1.1456 | 14113512 |
0.2209 | 0.3494 | 265 | 1.1455 | 14385280 |
0.2125 | 0.3560 | 270 | 1.1410 | 14660096 |
0.274 | 0.3626 | 275 | 1.1417 | 14931976 |
0.2181 | 0.3692 | 280 | 1.1411 | 15202008 |
0.2481 | 0.3758 | 285 | 1.1374 | 15468896 |
0.2629 | 0.3824 | 290 | 1.1372 | 15733744 |
0.2826 | 0.3890 | 295 | 1.1366 | 16004424 |
0.2646 | 0.3956 | 300 | 1.1363 | 16276088 |
0.2729 | 0.4022 | 305 | 1.1333 | 16547304 |
0.2735 | 0.4088 | 310 | 1.1350 | 16819224 |
0.2881 | 0.4154 | 315 | 1.1349 | 17088704 |
0.2208 | 0.4220 | 320 | 1.1304 | 17362560 |
0.1822 | 0.4285 | 325 | 1.1348 | 17632840 |
0.3197 | 0.4351 | 330 | 1.1306 | 17903232 |
0.1763 | 0.4417 | 335 | 1.1287 | 18171208 |
0.2851 | 0.4483 | 340 | 1.1333 | 18444312 |
0.2406 | 0.4549 | 345 | 1.1318 | 18716768 |
0.2571 | 0.4615 | 350 | 1.1291 | 18983016 |
0.3931 | 0.4681 | 355 | 1.1282 | 19256840 |
0.1952 | 0.4747 | 360 | 1.1287 | 19527776 |
0.227 | 0.4813 | 365 | 1.1282 | 19800232 |
0.2979 | 0.4879 | 370 | 1.1285 | 20074720 |
0.1515 | 0.4945 | 375 | 1.1280 | 20350824 |
0.336 | 0.5011 | 380 | 1.1254 | 20627392 |
0.2381 | 0.5077 | 385 | 1.1258 | 20900344 |
0.2331 | 0.5143 | 390 | 1.1253 | 21173120 |
0.2176 | 0.5209 | 395 | 1.1250 | 21442720 |
0.232 | 0.5274 | 400 | 1.1268 | 21711376 |
0.2648 | 0.5340 | 405 | 1.1246 | 21977752 |
0.2398 | 0.5406 | 410 | 1.1241 | 22247224 |
0.2246 | 0.5472 | 415 | 1.1245 | 22525976 |
0.2836 | 0.5538 | 420 | 1.1199 | 22795472 |
0.242 | 0.5604 | 425 | 1.1233 | 23063720 |
0.2369 | 0.5670 | 430 | 1.1230 | 23333144 |
0.2856 | 0.5736 | 435 | 1.1206 | 23599032 |
0.2595 | 0.5802 | 440 | 1.1208 | 23871616 |
0.2154 | 0.5868 | 445 | 1.1188 | 24144160 |
0.2541 | 0.5934 | 450 | 1.1208 | 24412552 |
0.2378 | 0.6000 | 455 | 1.1210 | 24683400 |
0.233 | 0.6066 | 460 | 1.1183 | 24956656 |
0.3136 | 0.6132 | 465 | 1.1211 | 25235888 |
0.2549 | 0.6197 | 470 | 1.1185 | 25505944 |
0.259 | 0.6263 | 475 | 1.1179 | 25776080 |
0.1539 | 0.6329 | 480 | 1.1197 | 26043984 |
0.2459 | 0.6395 | 485 | 1.1183 | 26318896 |
0.2342 | 0.6461 | 490 | 1.1182 | 26585616 |
0.2173 | 0.6527 | 495 | 1.1172 | 26862168 |
0.3048 | 0.6593 | 500 | 1.1172 | 27130760 |
0.2851 | 0.6659 | 505 | 1.1142 | 27397928 |
0.2091 | 0.6725 | 510 | 1.1148 | 27670712 |
0.3143 | 0.6791 | 515 | 1.1149 | 27933056 |
0.1672 | 0.6857 | 520 | 1.1152 | 28201952 |
0.3181 | 0.6923 | 525 | 1.1164 | 28477464 |
0.1914 | 0.6989 | 530 | 1.1174 | 28743664 |
0.2931 | 0.7055 | 535 | 1.1155 | 29016592 |
0.2285 | 0.7120 | 540 | 1.1133 | 29283872 |
0.2749 | 0.7186 | 545 | 1.1163 | 29554240 |
0.2901 | 0.7252 | 550 | 1.1145 | 29821128 |
0.2361 | 0.7318 | 555 | 1.1114 | 30095352 |
0.2654 | 0.7384 | 560 | 1.1125 | 30371160 |
0.1935 | 0.7450 | 565 | 1.1129 | 30645928 |
0.268 | 0.7516 | 570 | 1.1101 | 30919376 |
0.1795 | 0.7582 | 575 | 1.1139 | 31186848 |
0.2439 | 0.7648 | 580 | 1.1122 | 31459480 |
0.259 | 0.7714 | 585 | 1.1091 | 31733560 |
0.248 | 0.7780 | 590 | 1.1105 | 32003016 |
0.2186 | 0.7846 | 595 | 1.1106 | 32278448 |
0.1595 | 0.7912 | 600 | 1.1115 | 32538192 |
0.2058 | 0.7978 | 605 | 1.1117 | 32816064 |
0.2324 | 0.8044 | 610 | 1.1095 | 33087144 |
0.2045 | 0.8109 | 615 | 1.1094 | 33353000 |
0.2333 | 0.8175 | 620 | 1.1095 | 33621888 |
0.2159 | 0.8241 | 625 | 1.1076 | 33888104 |
0.2866 | 0.8307 | 630 | 1.1094 | 34159240 |
0.2268 | 0.8373 | 635 | 1.1101 | 34430064 |
0.1753 | 0.8439 | 640 | 1.1100 | 34700128 |
0.2076 | 0.8505 | 645 | 1.1089 | 34968768 |
0.1912 | 0.8571 | 650 | 1.1069 | 35250136 |
0.1534 | 0.8637 | 655 | 1.1074 | 35524024 |
0.1424 | 0.8703 | 660 | 1.1083 | 35789520 |
0.2325 | 0.8769 | 665 | 1.1076 | 36067376 |
0.2607 | 0.8835 | 670 | 1.1046 | 36340512 |
0.234 | 0.8901 | 675 | 1.1048 | 36603160 |
0.232 | 0.8967 | 680 | 1.1081 | 36872480 |
0.2998 | 0.9032 | 685 | 1.1080 | 37146736 |
0.1921 | 0.9098 | 690 | 1.1045 | 37414776 |
0.2492 | 0.9164 | 695 | 1.1060 | 37685600 |
0.27 | 0.9230 | 700 | 1.1068 | 37949648 |
0.2159 | 0.9296 | 705 | 1.1046 | 38226312 |
0.1912 | 0.9362 | 710 | 1.1062 | 38502072 |
0.23 | 0.9428 | 715 | 1.1076 | 38772744 |
0.3387 | 0.9494 | 720 | 1.1054 | 39041632 |
0.23 | 0.9560 | 725 | 1.1051 | 39313560 |
0.2785 | 0.9626 | 730 | 1.1065 | 39585992 |
0.2116 | 0.9692 | 735 | 1.1030 | 39856632 |
0.2378 | 0.9758 | 740 | 1.1040 | 40120176 |
0.2006 | 0.9824 | 745 | 1.1046 | 40392064 |
0.2418 | 0.9890 | 750 | 1.1024 | 40664776 |
0.2041 | 0.9955 | 755 | 1.1028 | 40931592 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2
Base model
google/gemma-2-2b