Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0913
  • Num Input Tokens Seen: 30890616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4951 0.0088 5 1.3821 271232
1.4563 0.0176 10 1.3208 543664
1.3667 0.0264 15 1.2559 821864
1.2468 0.0352 20 1.1999 1091344
1.1733 0.0441 25 1.1730 1367232
1.0267 0.0529 30 1.1804 1638224
0.864 0.0617 35 1.1806 1913808
0.7861 0.0705 40 1.1954 2189912
0.7807 0.0793 45 1.2160 2452272
0.5332 0.0881 50 1.2235 2725128
0.573 0.0969 55 1.2244 2999016
0.5712 0.1057 60 1.2128 3271968
0.4288 0.1146 65 1.1903 3549280
0.4498 0.1234 70 1.1855 3823400
0.348 0.1322 75 1.1776 4090400
0.4052 0.1410 80 1.1760 4365584
0.3448 0.1498 85 1.1634 4637496
0.3418 0.1586 90 1.1639 4900760
0.3926 0.1674 95 1.1575 5171952
0.4322 0.1762 100 1.1566 5443000
0.3339 0.1850 105 1.1545 5712944
0.4672 0.1939 110 1.1510 5985072
0.315 0.2027 115 1.1509 6252048
0.3656 0.2115 120 1.1440 6523272
0.4343 0.2203 125 1.1468 6796536
0.3248 0.2291 130 1.1404 7067320
0.3063 0.2379 135 1.1457 7335176
0.3174 0.2467 140 1.1412 7607696
0.2611 0.2555 145 1.1442 7880176
0.3732 0.2643 150 1.1361 8151896
0.275 0.2732 155 1.1407 8428120
0.2902 0.2820 160 1.1367 8702104
0.2883 0.2908 165 1.1359 8970264
0.2804 0.2996 170 1.1360 9242488
0.2668 0.3084 175 1.1313 9514312
0.3018 0.3172 180 1.1331 9792568
0.2895 0.3260 185 1.1287 10067840
0.319 0.3348 190 1.1288 10336576
0.2636 0.3437 195 1.1277 10614920
0.2802 0.3525 200 1.1280 10884976
0.3354 0.3613 205 1.1252 11161384
0.348 0.3701 210 1.1268 11432472
0.2536 0.3789 215 1.1230 11709552
0.2744 0.3877 220 1.1237 11979744
0.274 0.3965 225 1.1238 12250848
0.3241 0.4053 230 1.1207 12526408
0.3095 0.4141 235 1.1204 12793864
0.2996 0.4230 240 1.1202 13056144
0.2803 0.4318 245 1.1202 13331664
0.3346 0.4406 250 1.1167 13607696
0.2643 0.4494 255 1.1170 13877856
0.3123 0.4582 260 1.1186 14147416
0.3048 0.4670 265 1.1167 14418600
0.408 0.4758 270 1.1154 14693312
0.3059 0.4846 275 1.1167 14958704
0.2863 0.4934 280 1.1133 15234336
0.2354 0.5023 285 1.1144 15507664
0.2094 0.5111 290 1.1138 15779648
0.3262 0.5199 295 1.1116 16048520
0.2988 0.5287 300 1.1128 16315984
0.1602 0.5375 305 1.1114 16586704
0.2703 0.5463 310 1.1109 16856960
0.2671 0.5551 315 1.1105 17130984
0.2595 0.5639 320 1.1100 17405032
0.2584 0.5728 325 1.1103 17672464
0.2967 0.5816 330 1.1074 17940736
0.2693 0.5904 335 1.1111 18209096
0.2368 0.5992 340 1.1083 18489328
0.3227 0.6080 345 1.1095 18763392
0.2433 0.6168 350 1.1079 19033928
0.2663 0.6256 355 1.1064 19306496
0.2232 0.6344 360 1.1078 19582464
0.215 0.6432 365 1.1057 19855128
0.285 0.6521 370 1.1041 20118936
0.2812 0.6609 375 1.1047 20386944
0.2726 0.6697 380 1.1061 20661136
0.2298 0.6785 385 1.1036 20934448
0.2719 0.6873 390 1.1043 21212424
0.2636 0.6961 395 1.1053 21483592
0.2778 0.7049 400 1.1019 21759880
0.2443 0.7137 405 1.1011 22031808
0.3002 0.7225 410 1.1028 22308840
0.2201 0.7314 415 1.1026 22581432
0.3103 0.7402 420 1.1011 22852504
0.2672 0.7490 425 1.0994 23120392
0.3186 0.7578 430 1.1016 23393176
0.2821 0.7666 435 1.1007 23666664
0.3132 0.7754 440 1.0987 23941552
0.2671 0.7842 445 1.0978 24216152
0.1736 0.7930 450 1.0975 24490968
0.3105 0.8019 455 1.0980 24765600
0.3713 0.8107 460 1.0961 25042848
0.3498 0.8195 465 1.0968 25319312
0.2632 0.8283 470 1.0983 25596904
0.308 0.8371 475 1.0951 25873656
0.2886 0.8459 480 1.0952 26149160
0.2547 0.8547 485 1.0952 26423016
0.2806 0.8635 490 1.0948 26701520
0.2446 0.8723 495 1.0947 26970808
0.2854 0.8812 500 1.0940 27243712
0.2576 0.8900 505 1.0945 27513104
0.2532 0.8988 510 1.0961 27784952
0.3655 0.9076 515 1.0942 28053616
0.2836 0.9164 520 1.0941 28325080
0.2758 0.9252 525 1.0963 28595744
0.2029 0.9340 530 1.0943 28870736
0.2777 0.9428 535 1.0943 29146344
0.2305 0.9516 540 1.0959 29417184
0.3159 0.9605 545 1.0959 29684608
0.3386 0.9693 550 1.0919 29958936
0.1623 0.9781 555 1.0933 30227592
0.3154 0.9869 560 1.0950 30506240
0.2721 0.9957 565 1.0915 30779280

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter6_sftsd1

Base model

google/gemma-2-2b
Finetuned
(437)
this model