collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd1
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0913
- Num Input Tokens Seen: 30890616
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 1
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.4951 | 0.0088 | 5 | 1.3821 | 271232 |
1.4563 | 0.0176 | 10 | 1.3208 | 543664 |
1.3667 | 0.0264 | 15 | 1.2559 | 821864 |
1.2468 | 0.0352 | 20 | 1.1999 | 1091344 |
1.1733 | 0.0441 | 25 | 1.1730 | 1367232 |
1.0267 | 0.0529 | 30 | 1.1804 | 1638224 |
0.864 | 0.0617 | 35 | 1.1806 | 1913808 |
0.7861 | 0.0705 | 40 | 1.1954 | 2189912 |
0.7807 | 0.0793 | 45 | 1.2160 | 2452272 |
0.5332 | 0.0881 | 50 | 1.2235 | 2725128 |
0.573 | 0.0969 | 55 | 1.2244 | 2999016 |
0.5712 | 0.1057 | 60 | 1.2128 | 3271968 |
0.4288 | 0.1146 | 65 | 1.1903 | 3549280 |
0.4498 | 0.1234 | 70 | 1.1855 | 3823400 |
0.348 | 0.1322 | 75 | 1.1776 | 4090400 |
0.4052 | 0.1410 | 80 | 1.1760 | 4365584 |
0.3448 | 0.1498 | 85 | 1.1634 | 4637496 |
0.3418 | 0.1586 | 90 | 1.1639 | 4900760 |
0.3926 | 0.1674 | 95 | 1.1575 | 5171952 |
0.4322 | 0.1762 | 100 | 1.1566 | 5443000 |
0.3339 | 0.1850 | 105 | 1.1545 | 5712944 |
0.4672 | 0.1939 | 110 | 1.1510 | 5985072 |
0.315 | 0.2027 | 115 | 1.1509 | 6252048 |
0.3656 | 0.2115 | 120 | 1.1440 | 6523272 |
0.4343 | 0.2203 | 125 | 1.1468 | 6796536 |
0.3248 | 0.2291 | 130 | 1.1404 | 7067320 |
0.3063 | 0.2379 | 135 | 1.1457 | 7335176 |
0.3174 | 0.2467 | 140 | 1.1412 | 7607696 |
0.2611 | 0.2555 | 145 | 1.1442 | 7880176 |
0.3732 | 0.2643 | 150 | 1.1361 | 8151896 |
0.275 | 0.2732 | 155 | 1.1407 | 8428120 |
0.2902 | 0.2820 | 160 | 1.1367 | 8702104 |
0.2883 | 0.2908 | 165 | 1.1359 | 8970264 |
0.2804 | 0.2996 | 170 | 1.1360 | 9242488 |
0.2668 | 0.3084 | 175 | 1.1313 | 9514312 |
0.3018 | 0.3172 | 180 | 1.1331 | 9792568 |
0.2895 | 0.3260 | 185 | 1.1287 | 10067840 |
0.319 | 0.3348 | 190 | 1.1288 | 10336576 |
0.2636 | 0.3437 | 195 | 1.1277 | 10614920 |
0.2802 | 0.3525 | 200 | 1.1280 | 10884976 |
0.3354 | 0.3613 | 205 | 1.1252 | 11161384 |
0.348 | 0.3701 | 210 | 1.1268 | 11432472 |
0.2536 | 0.3789 | 215 | 1.1230 | 11709552 |
0.2744 | 0.3877 | 220 | 1.1237 | 11979744 |
0.274 | 0.3965 | 225 | 1.1238 | 12250848 |
0.3241 | 0.4053 | 230 | 1.1207 | 12526408 |
0.3095 | 0.4141 | 235 | 1.1204 | 12793864 |
0.2996 | 0.4230 | 240 | 1.1202 | 13056144 |
0.2803 | 0.4318 | 245 | 1.1202 | 13331664 |
0.3346 | 0.4406 | 250 | 1.1167 | 13607696 |
0.2643 | 0.4494 | 255 | 1.1170 | 13877856 |
0.3123 | 0.4582 | 260 | 1.1186 | 14147416 |
0.3048 | 0.4670 | 265 | 1.1167 | 14418600 |
0.408 | 0.4758 | 270 | 1.1154 | 14693312 |
0.3059 | 0.4846 | 275 | 1.1167 | 14958704 |
0.2863 | 0.4934 | 280 | 1.1133 | 15234336 |
0.2354 | 0.5023 | 285 | 1.1144 | 15507664 |
0.2094 | 0.5111 | 290 | 1.1138 | 15779648 |
0.3262 | 0.5199 | 295 | 1.1116 | 16048520 |
0.2988 | 0.5287 | 300 | 1.1128 | 16315984 |
0.1602 | 0.5375 | 305 | 1.1114 | 16586704 |
0.2703 | 0.5463 | 310 | 1.1109 | 16856960 |
0.2671 | 0.5551 | 315 | 1.1105 | 17130984 |
0.2595 | 0.5639 | 320 | 1.1100 | 17405032 |
0.2584 | 0.5728 | 325 | 1.1103 | 17672464 |
0.2967 | 0.5816 | 330 | 1.1074 | 17940736 |
0.2693 | 0.5904 | 335 | 1.1111 | 18209096 |
0.2368 | 0.5992 | 340 | 1.1083 | 18489328 |
0.3227 | 0.6080 | 345 | 1.1095 | 18763392 |
0.2433 | 0.6168 | 350 | 1.1079 | 19033928 |
0.2663 | 0.6256 | 355 | 1.1064 | 19306496 |
0.2232 | 0.6344 | 360 | 1.1078 | 19582464 |
0.215 | 0.6432 | 365 | 1.1057 | 19855128 |
0.285 | 0.6521 | 370 | 1.1041 | 20118936 |
0.2812 | 0.6609 | 375 | 1.1047 | 20386944 |
0.2726 | 0.6697 | 380 | 1.1061 | 20661136 |
0.2298 | 0.6785 | 385 | 1.1036 | 20934448 |
0.2719 | 0.6873 | 390 | 1.1043 | 21212424 |
0.2636 | 0.6961 | 395 | 1.1053 | 21483592 |
0.2778 | 0.7049 | 400 | 1.1019 | 21759880 |
0.2443 | 0.7137 | 405 | 1.1011 | 22031808 |
0.3002 | 0.7225 | 410 | 1.1028 | 22308840 |
0.2201 | 0.7314 | 415 | 1.1026 | 22581432 |
0.3103 | 0.7402 | 420 | 1.1011 | 22852504 |
0.2672 | 0.7490 | 425 | 1.0994 | 23120392 |
0.3186 | 0.7578 | 430 | 1.1016 | 23393176 |
0.2821 | 0.7666 | 435 | 1.1007 | 23666664 |
0.3132 | 0.7754 | 440 | 1.0987 | 23941552 |
0.2671 | 0.7842 | 445 | 1.0978 | 24216152 |
0.1736 | 0.7930 | 450 | 1.0975 | 24490968 |
0.3105 | 0.8019 | 455 | 1.0980 | 24765600 |
0.3713 | 0.8107 | 460 | 1.0961 | 25042848 |
0.3498 | 0.8195 | 465 | 1.0968 | 25319312 |
0.2632 | 0.8283 | 470 | 1.0983 | 25596904 |
0.308 | 0.8371 | 475 | 1.0951 | 25873656 |
0.2886 | 0.8459 | 480 | 1.0952 | 26149160 |
0.2547 | 0.8547 | 485 | 1.0952 | 26423016 |
0.2806 | 0.8635 | 490 | 1.0948 | 26701520 |
0.2446 | 0.8723 | 495 | 1.0947 | 26970808 |
0.2854 | 0.8812 | 500 | 1.0940 | 27243712 |
0.2576 | 0.8900 | 505 | 1.0945 | 27513104 |
0.2532 | 0.8988 | 510 | 1.0961 | 27784952 |
0.3655 | 0.9076 | 515 | 1.0942 | 28053616 |
0.2836 | 0.9164 | 520 | 1.0941 | 28325080 |
0.2758 | 0.9252 | 525 | 1.0963 | 28595744 |
0.2029 | 0.9340 | 530 | 1.0943 | 28870736 |
0.2777 | 0.9428 | 535 | 1.0943 | 29146344 |
0.2305 | 0.9516 | 540 | 1.0959 | 29417184 |
0.3159 | 0.9605 | 545 | 1.0959 | 29684608 |
0.3386 | 0.9693 | 550 | 1.0919 | 29958936 |
0.1623 | 0.9781 | 555 | 1.0933 | 30227592 |
0.3154 | 0.9869 | 560 | 1.0950 | 30506240 |
0.2721 | 0.9957 | 565 | 1.0915 | 30779280 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 4
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd1
Base model
google/gemma-2-2b