--- license: gemma base_model: google/gemma-2-2b tags: - trl - sft - generated_from_trainer model-index: - name: collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0 results: [] --- # collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd0 This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1141 - Num Input Tokens Seen: 38535136 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-06 - train_batch_size: 8 - eval_batch_size: 16 - seed: 0 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen | |:-------------:|:------:|:----:|:---------------:|:-----------------:| | No log | 0 | 0 | 1.3956 | 0 | | 1.753 | 0.0072 | 5 | 1.3915 | 274800 | | 1.645 | 0.0143 | 10 | 1.3471 | 553336 | | 1.5097 | 0.0215 | 15 | 1.2784 | 825472 | | 1.4043 | 0.0286 | 20 | 1.2252 | 1094400 | | 1.2715 | 0.0358 | 25 | 1.1789 | 1368560 | | 1.2172 | 0.0430 | 30 | 1.1681 | 1643752 | | 1.1701 | 0.0501 | 35 | 1.1474 | 1919896 | | 1.012 | 0.0573 | 40 | 1.1634 | 2202664 | | 0.977 | 0.0645 | 45 | 1.1833 | 2485728 | | 0.9504 | 0.0716 | 50 | 1.1892 | 2760656 | | 0.8804 | 0.0788 | 55 | 1.2024 | 3030296 | | 0.7706 | 0.0859 | 60 | 1.2172 | 3302344 | | 0.7382 | 0.0931 | 65 | 1.2354 | 3580728 | | 0.5907 | 0.1003 | 70 | 1.2216 | 3858040 | | 0.5639 | 0.1074 | 75 | 1.2151 | 4131752 | | 0.5866 | 0.1146 | 80 | 1.2167 | 4408168 | | 0.6131 | 0.1217 | 85 | 1.2232 | 4684048 | | 0.5387 | 0.1289 | 90 | 1.2203 | 4956816 | | 0.588 | 0.1361 | 95 | 1.2124 | 5236664 | | 0.5076 | 0.1432 | 100 | 1.2125 | 5512104 | | 0.4164 | 0.1504 | 105 | 1.2181 | 5787680 | | 0.4371 | 0.1576 | 110 | 1.2111 | 6061640 | | 0.4415 | 0.1647 | 115 | 1.2035 | 6339744 | | 0.4482 | 0.1719 | 120 | 1.2025 | 6616088 | | 0.4337 | 0.1790 | 125 | 1.2025 | 6890352 | | 0.4609 | 0.1862 | 130 | 1.1980 | 7161728 | | 0.3955 | 0.1934 | 135 | 1.2066 | 7437056 | | 0.4134 | 0.2005 | 140 | 1.1994 | 7714640 | | 0.2926 | 0.2077 | 145 | 1.2044 | 7990680 | | 0.5047 | 0.2148 | 150 | 1.1958 | 8272024 | | 0.3491 | 0.2220 | 155 | 1.2003 | 8543152 | | 0.3948 | 0.2292 | 160 | 1.1946 | 8817304 | | 0.4029 | 0.2363 | 165 | 1.2019 | 9095752 | | 0.2683 | 0.2435 | 170 | 1.1840 | 9367952 | | 0.3407 | 0.2506 | 175 | 1.1988 | 9649744 | | 0.3316 | 0.2578 | 180 | 1.1874 | 9915512 | | 0.4204 | 0.2650 | 185 | 1.1885 | 10190280 | | 0.2743 | 0.2721 | 190 | 1.1846 | 10465416 | | 0.2852 | 0.2793 | 195 | 1.1833 | 10743016 | | 0.3708 | 0.2865 | 200 | 1.1827 | 11018864 | | 0.2405 | 0.2936 | 205 | 1.1810 | 11294712 | | 0.3435 | 0.3008 | 210 | 1.1847 | 11566136 | | 0.277 | 0.3079 | 215 | 1.1775 | 11839000 | | 0.31 | 0.3151 | 220 | 1.1869 | 12110104 | | 0.3004 | 0.3223 | 225 | 1.1719 | 12387072 | | 0.2593 | 0.3294 | 230 | 1.1799 | 12659864 | | 0.3017 | 0.3366 | 235 | 1.1710 | 12928592 | | 0.3225 | 0.3437 | 240 | 1.1738 | 13203112 | | 0.2976 | 0.3509 | 245 | 1.1753 | 13475880 | | 0.2385 | 0.3581 | 250 | 1.1657 | 13751768 | | 0.3222 | 0.3652 | 255 | 1.1733 | 14032088 | | 0.2892 | 0.3724 | 260 | 1.1660 | 14306696 | | 0.5871 | 0.3796 | 265 | 1.1624 | 14590560 | | 0.3256 | 0.3867 | 270 | 1.1665 | 14862432 | | 0.312 | 0.3939 | 275 | 1.1600 | 15143808 | | 0.317 | 0.4010 | 280 | 1.1618 | 15415480 | | 0.2964 | 0.4082 | 285 | 1.1640 | 15694936 | | 0.3226 | 0.4154 | 290 | 1.1586 | 15974968 | | 0.2756 | 0.4225 | 295 | 1.1595 | 16255032 | | 0.2167 | 0.4297 | 300 | 1.1596 | 16539088 | | 0.3576 | 0.4368 | 305 | 1.1566 | 16819088 | | 0.2757 | 0.4440 | 310 | 1.1541 | 17100912 | | 0.2413 | 0.4512 | 315 | 1.1550 | 17373744 | | 0.3459 | 0.4583 | 320 | 1.1483 | 17647448 | | 0.2882 | 0.4655 | 325 | 1.1493 | 17922920 | | 0.2383 | 0.4727 | 330 | 1.1471 | 18194680 | | 0.2872 | 0.4798 | 335 | 1.1510 | 18471192 | | 0.2302 | 0.4870 | 340 | 1.1474 | 18747848 | | 0.285 | 0.4941 | 345 | 1.1484 | 19026688 | | 0.2765 | 0.5013 | 350 | 1.1456 | 19293616 | | 0.1756 | 0.5085 | 355 | 1.1435 | 19570744 | | 0.303 | 0.5156 | 360 | 1.1457 | 19845048 | | 0.2726 | 0.5228 | 365 | 1.1422 | 20115096 | | 0.2625 | 0.5299 | 370 | 1.1423 | 20395336 | | 0.2419 | 0.5371 | 375 | 1.1430 | 20667208 | | 0.1856 | 0.5443 | 380 | 1.1388 | 20948560 | | 0.3427 | 0.5514 | 385 | 1.1400 | 21218968 | | 0.2147 | 0.5586 | 390 | 1.1354 | 21489088 | | 0.2514 | 0.5658 | 395 | 1.1387 | 21764248 | | 0.293 | 0.5729 | 400 | 1.1345 | 22038944 | | 0.2699 | 0.5801 | 405 | 1.1349 | 22312360 | | 0.2219 | 0.5872 | 410 | 1.1353 | 22589016 | | 0.3573 | 0.5944 | 415 | 1.1305 | 22864576 | | 0.343 | 0.6016 | 420 | 1.1355 | 23144760 | | 0.2924 | 0.6087 | 425 | 1.1347 | 23421952 | | 0.2846 | 0.6159 | 430 | 1.1293 | 23700352 | | 0.2971 | 0.6230 | 435 | 1.1328 | 23983624 | | 0.2037 | 0.6302 | 440 | 1.1312 | 24263512 | | 0.29 | 0.6374 | 445 | 1.1309 | 24530624 | | 0.2089 | 0.6445 | 450 | 1.1317 | 24800848 | | 0.2477 | 0.6517 | 455 | 1.1318 | 25080464 | | 0.2275 | 0.6588 | 460 | 1.1265 | 25356832 | | 0.2335 | 0.6660 | 465 | 1.1285 | 25638344 | | 0.1839 | 0.6732 | 470 | 1.1326 | 25912488 | | 0.2514 | 0.6803 | 475 | 1.1276 | 26189888 | | 0.3751 | 0.6875 | 480 | 1.1271 | 26472040 | | 0.2701 | 0.6947 | 485 | 1.1260 | 26753624 | | 0.2235 | 0.7018 | 490 | 1.1254 | 27029592 | | 0.244 | 0.7090 | 495 | 1.1246 | 27311520 | | 0.2294 | 0.7161 | 500 | 1.1231 | 27586432 | | 0.2949 | 0.7233 | 505 | 1.1247 | 27860176 | | 0.1593 | 0.7305 | 510 | 1.1254 | 28137160 | | 0.2553 | 0.7376 | 515 | 1.1257 | 28418864 | | 0.1885 | 0.7448 | 520 | 1.1249 | 28696856 | | 0.2695 | 0.7519 | 525 | 1.1251 | 28975192 | | 0.2545 | 0.7591 | 530 | 1.1214 | 29251760 | | 0.2446 | 0.7663 | 535 | 1.1211 | 29528808 | | 0.3202 | 0.7734 | 540 | 1.1233 | 29803128 | | 0.2623 | 0.7806 | 545 | 1.1200 | 30079416 | | 0.2142 | 0.7878 | 550 | 1.1205 | 30352064 | | 0.2502 | 0.7949 | 555 | 1.1210 | 30629824 | | 0.3042 | 0.8021 | 560 | 1.1180 | 30904272 | | 0.197 | 0.8092 | 565 | 1.1196 | 31174976 | | 0.2593 | 0.8164 | 570 | 1.1191 | 31446624 | | 0.3324 | 0.8236 | 575 | 1.1183 | 31729592 | | 0.2113 | 0.8307 | 580 | 1.1203 | 32004000 | | 0.2764 | 0.8379 | 585 | 1.1196 | 32277080 | | 0.2863 | 0.8450 | 590 | 1.1166 | 32551352 | | 0.1917 | 0.8522 | 595 | 1.1213 | 32831496 | | 0.1784 | 0.8594 | 600 | 1.1194 | 33113448 | | 0.2198 | 0.8665 | 605 | 1.1173 | 33387680 | | 0.3067 | 0.8737 | 610 | 1.1185 | 33664656 | | 0.2372 | 0.8809 | 615 | 1.1154 | 33938472 | | 0.2207 | 0.8880 | 620 | 1.1172 | 34216144 | | 0.2026 | 0.8952 | 625 | 1.1177 | 34487704 | | 0.2003 | 0.9023 | 630 | 1.1144 | 34767944 | | 0.2438 | 0.9095 | 635 | 1.1178 | 35042160 | | 0.3055 | 0.9167 | 640 | 1.1154 | 35322704 | | 0.2598 | 0.9238 | 645 | 1.1137 | 35599184 | | 0.2283 | 0.9310 | 650 | 1.1163 | 35874368 | | 0.2463 | 0.9381 | 655 | 1.1152 | 36142120 | | 0.2388 | 0.9453 | 660 | 1.1133 | 36411336 | | 0.2284 | 0.9525 | 665 | 1.1161 | 36697696 | | 0.2146 | 0.9596 | 670 | 1.1133 | 36973112 | | 0.2494 | 0.9668 | 675 | 1.1151 | 37252568 | | 0.2118 | 0.9740 | 680 | 1.1151 | 37528656 | | 0.2539 | 0.9811 | 685 | 1.1131 | 37804520 | | 0.2345 | 0.9883 | 690 | 1.1137 | 38078360 | | 0.2216 | 0.9954 | 695 | 1.1142 | 38361640 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1