Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0940
  • Num Input Tokens Seen: 41008256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.622 0.0066 5 1.3875 269304
1.5034 0.0131 10 1.3529 535392
1.4652 0.0197 15 1.2845 801488
1.3824 0.0263 20 1.2330 1083632
1.3513 0.0329 25 1.1882 1355072
1.1303 0.0394 30 1.1821 1635432
0.9928 0.0460 35 1.1924 1894472
0.8215 0.0526 40 1.2128 2161232
0.8303 0.0592 45 1.2421 2428280
0.5895 0.0657 50 1.2467 2702640
0.5274 0.0723 55 1.2585 2973544
0.4315 0.0789 60 1.2433 3236296
0.4844 0.0855 65 1.2217 3506176
0.3115 0.0920 70 1.2198 3780160
0.3854 0.0986 75 1.2028 4051568
0.3065 0.1052 80 1.1925 4324928
0.3682 0.1118 85 1.1846 4593592
0.5041 0.1183 90 1.1806 4867408
0.2775 0.1249 95 1.1759 5128832
0.2909 0.1315 100 1.1737 5401472
0.3715 0.1381 105 1.1742 5673312
0.3444 0.1446 110 1.1667 5945400
0.3783 0.1512 115 1.1666 6217600
0.2508 0.1578 120 1.1635 6483312
0.2896 0.1644 125 1.1591 6757952
0.2647 0.1709 130 1.1586 7031456
0.1641 0.1775 135 1.1563 7296128
0.2283 0.1841 140 1.1550 7571176
0.2946 0.1906 145 1.1524 7847912
0.2922 0.1972 150 1.1484 8116960
0.2966 0.2038 155 1.1481 8393608
0.268 0.2104 160 1.1539 8663712
0.2847 0.2169 165 1.1498 8925096
0.2498 0.2235 170 1.1483 9194968
0.2431 0.2301 175 1.1496 9464256
0.2411 0.2367 180 1.1453 9727032
0.2876 0.2432 185 1.1429 9997984
0.3148 0.2498 190 1.1435 10271224
0.2655 0.2564 195 1.1408 10546488
0.2446 0.2630 200 1.1415 10805248
0.2493 0.2695 205 1.1428 11074256
0.2977 0.2761 210 1.1383 11346264
0.3008 0.2827 215 1.1380 11612816
0.212 0.2893 220 1.1349 11891040
0.2596 0.2958 225 1.1377 12163592
0.1793 0.3024 230 1.1370 12425752
0.248 0.3090 235 1.1325 12694640
0.2415 0.3156 240 1.1331 12963992
0.2047 0.3221 245 1.1319 13234768
0.1848 0.3287 250 1.1310 13511432
0.1624 0.3353 255 1.1309 13785032
0.2183 0.3419 260 1.1269 14052560
0.2079 0.3484 265 1.1321 14318664
0.1957 0.3550 270 1.1292 14591392
0.1832 0.3616 275 1.1273 14857944
0.2016 0.3681 280 1.1240 15133456
0.2329 0.3747 285 1.1258 15404048
0.2867 0.3813 290 1.1256 15674488
0.2546 0.3879 295 1.1245 15950072
0.2182 0.3944 300 1.1226 16211512
0.2931 0.4010 305 1.1222 16484192
0.2325 0.4076 310 1.1228 16754264
0.2637 0.4142 315 1.1211 17023608
0.1728 0.4207 320 1.1188 17305976
0.2263 0.4273 325 1.1195 17575456
0.2625 0.4339 330 1.1184 17840744
0.1631 0.4405 335 1.1177 18105176
0.1778 0.4470 340 1.1180 18369064
0.327 0.4536 345 1.1150 18635856
0.2488 0.4602 350 1.1160 18906504
0.2863 0.4668 355 1.1146 19171744
0.2554 0.4733 360 1.1152 19443216
0.2097 0.4799 365 1.1171 19710312
0.2428 0.4865 370 1.1147 19983280
0.1757 0.4931 375 1.1157 20253048
0.2844 0.4996 380 1.1143 20521536
0.2519 0.5062 385 1.1135 20793304
0.14 0.5128 390 1.1135 21056880
0.175 0.5194 395 1.1139 21322760
0.2719 0.5259 400 1.1138 21588632
0.2211 0.5325 405 1.1119 21863192
0.2711 0.5391 410 1.1115 22136640
0.2192 0.5456 415 1.1097 22400024
0.2555 0.5522 420 1.1088 22663600
0.2381 0.5588 425 1.1071 22931864
0.287 0.5654 430 1.1090 23211784
0.2197 0.5719 435 1.1079 23473528
0.1785 0.5785 440 1.1071 23741512
0.1782 0.5851 445 1.1088 24013864
0.1792 0.5917 450 1.1081 24283944
0.2492 0.5982 455 1.1053 24555032
0.2555 0.6048 460 1.1070 24818080
0.2014 0.6114 465 1.1091 25091208
0.1869 0.6180 470 1.1049 25354352
0.2532 0.6245 475 1.1049 25626256
0.2373 0.6311 480 1.1082 25900944
0.1992 0.6377 485 1.1064 26173568
0.2187 0.6443 490 1.1063 26447272
0.2218 0.6508 495 1.1089 26715952
0.2322 0.6574 500 1.1061 26983200
0.2482 0.6640 505 1.1060 27247440
0.1582 0.6706 510 1.1054 27515256
0.2757 0.6771 515 1.1051 27778344
0.1809 0.6837 520 1.1047 28049984
0.2369 0.6903 525 1.1042 28324744
0.2848 0.6969 530 1.1050 28589688
0.2827 0.7034 535 1.1021 28861280
0.2411 0.7100 540 1.1027 29129832
0.2118 0.7166 545 1.1020 29399128
0.1694 0.7231 550 1.1019 29669072
0.234 0.7297 555 1.1027 29932936
0.2118 0.7363 560 1.1031 30200984
0.2381 0.7429 565 1.1006 30467952
0.2596 0.7494 570 1.1016 30740152
0.2517 0.7560 575 1.1025 31013280
0.2295 0.7626 580 1.1009 31283736
0.2093 0.7692 585 1.1000 31546048
0.2714 0.7757 590 1.1016 31810008
0.1723 0.7823 595 1.0997 32082696
0.2339 0.7889 600 1.0983 32349272
0.2226 0.7955 605 1.0987 32617856
0.24 0.8020 610 1.0993 32890144
0.2459 0.8086 615 1.0978 33155616
0.2352 0.8152 620 1.0977 33421616
0.1846 0.8218 625 1.1003 33689760
0.1827 0.8283 630 1.0984 33954944
0.2186 0.8349 635 1.0991 34220096
0.1833 0.8415 640 1.1003 34487888
0.2651 0.8481 645 1.0984 34759656
0.2547 0.8546 650 1.0970 35032040
0.1985 0.8612 655 1.0965 35302816
0.2972 0.8678 660 1.0979 35576712
0.2817 0.8744 665 1.0956 35850400
0.2383 0.8809 670 1.0975 36121904
0.1814 0.8875 675 1.0993 36393368
0.2137 0.8941 680 1.0943 36664864
0.1752 0.9006 685 1.0941 36939200
0.2005 0.9072 690 1.0983 37205904
0.3429 0.9138 695 1.0970 37482984
0.2312 0.9204 700 1.0943 37755048
0.1952 0.9269 705 1.0958 38019952
0.2054 0.9335 710 1.0963 38291888
0.2247 0.9401 715 1.0958 38561640
0.1912 0.9467 720 1.0958 38835512
0.2334 0.9532 725 1.0964 39110024
0.1795 0.9598 730 1.0948 39382208
0.1963 0.9664 735 1.0946 39654856
0.2492 0.9730 740 1.0952 39930376
0.2831 0.9795 745 1.0927 40202200
0.2232 0.9861 750 1.0936 40469640
0.1724 0.9927 755 1.0955 40736256
0.2259 0.9993 760 1.0940 41008256

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd1

Base model

google/gemma-2-2b
Finetuned
(437)
this model