Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1029
  • Num Input Tokens Seen: 41091672

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5665 0.0066 5 1.3873 272560
1.5456 0.0132 10 1.3529 547080
1.4344 0.0198 15 1.2836 822880
1.4512 0.0264 20 1.2345 1089864
1.3462 0.0330 25 1.1901 1361576
1.1712 0.0396 30 1.1835 1634504
1.0826 0.0462 35 1.1964 1895936
0.9291 0.0527 40 1.1914 2166120
0.8296 0.0593 45 1.2208 2435904
0.6654 0.0659 50 1.2499 2706240
0.6401 0.0725 55 1.2356 2984976
0.6449 0.0791 60 1.2089 3257728
0.5585 0.0857 65 1.2026 3526976
0.468 0.0923 70 1.2120 3804888
0.5271 0.0989 75 1.2040 4078544
0.3901 0.1055 80 1.1976 4356048
0.4389 0.1121 85 1.2049 4621624
0.3482 0.1187 90 1.1972 4888632
0.3224 0.1253 95 1.1926 5152168
0.4305 0.1319 100 1.1944 5423968
0.3758 0.1385 105 1.1825 5697240
0.3646 0.1450 110 1.1919 5971384
0.3215 0.1516 115 1.1776 6240360
0.3273 0.1582 120 1.1907 6509288
0.3152 0.1648 125 1.1786 6779048
0.2365 0.1714 130 1.1833 7048200
0.3342 0.1780 135 1.1750 7316656
0.3586 0.1846 140 1.1774 7590728
0.2927 0.1912 145 1.1737 7859680
0.3788 0.1978 150 1.1760 8126224
0.2964 0.2044 155 1.1741 8403808
0.2938 0.2110 160 1.1677 8672216
0.2518 0.2176 165 1.1735 8946264
0.3334 0.2242 170 1.1647 9208352
0.311 0.2308 175 1.1647 9477208
0.3065 0.2373 180 1.1620 9748024
0.2517 0.2439 185 1.1613 10021768
0.2672 0.2505 190 1.1569 10293208
0.2611 0.2571 195 1.1545 10569280
0.2265 0.2637 200 1.1548 10840984
0.3068 0.2703 205 1.1520 11116568
0.2929 0.2769 210 1.1568 11394928
0.3351 0.2835 215 1.1547 11666600
0.2687 0.2901 220 1.1544 11946656
0.2501 0.2967 225 1.1479 12224240
0.1991 0.3033 230 1.1520 12500672
0.2434 0.3099 235 1.1477 12767840
0.1667 0.3165 240 1.1453 13035688
0.2564 0.3231 245 1.1509 13312232
0.2856 0.3297 250 1.1436 13584328
0.305 0.3362 255 1.1425 13853288
0.2765 0.3428 260 1.1456 14113512
0.2209 0.3494 265 1.1455 14385280
0.2125 0.3560 270 1.1410 14660096
0.274 0.3626 275 1.1417 14931976
0.2181 0.3692 280 1.1411 15202008
0.2481 0.3758 285 1.1374 15468896
0.2629 0.3824 290 1.1372 15733744
0.2826 0.3890 295 1.1366 16004424
0.2646 0.3956 300 1.1363 16276088
0.2729 0.4022 305 1.1333 16547304
0.2735 0.4088 310 1.1350 16819224
0.2881 0.4154 315 1.1349 17088704
0.2208 0.4220 320 1.1304 17362560
0.1822 0.4285 325 1.1348 17632840
0.3197 0.4351 330 1.1306 17903232
0.1763 0.4417 335 1.1287 18171208
0.2851 0.4483 340 1.1333 18444312
0.2406 0.4549 345 1.1318 18716768
0.2571 0.4615 350 1.1291 18983016
0.3931 0.4681 355 1.1282 19256840
0.1952 0.4747 360 1.1287 19527776
0.227 0.4813 365 1.1282 19800232
0.2979 0.4879 370 1.1285 20074720
0.1515 0.4945 375 1.1280 20350824
0.336 0.5011 380 1.1254 20627392
0.2381 0.5077 385 1.1258 20900344
0.2331 0.5143 390 1.1253 21173120
0.2176 0.5209 395 1.1250 21442720
0.232 0.5274 400 1.1268 21711376
0.2648 0.5340 405 1.1246 21977752
0.2398 0.5406 410 1.1241 22247224
0.2246 0.5472 415 1.1245 22525976
0.2836 0.5538 420 1.1199 22795472
0.242 0.5604 425 1.1233 23063720
0.2369 0.5670 430 1.1230 23333144
0.2856 0.5736 435 1.1206 23599032
0.2595 0.5802 440 1.1208 23871616
0.2154 0.5868 445 1.1188 24144160
0.2541 0.5934 450 1.1208 24412552
0.2378 0.6000 455 1.1210 24683400
0.233 0.6066 460 1.1183 24956656
0.3136 0.6132 465 1.1211 25235888
0.2549 0.6197 470 1.1185 25505944
0.259 0.6263 475 1.1179 25776080
0.1539 0.6329 480 1.1197 26043984
0.2459 0.6395 485 1.1183 26318896
0.2342 0.6461 490 1.1182 26585616
0.2173 0.6527 495 1.1172 26862168
0.3048 0.6593 500 1.1172 27130760
0.2851 0.6659 505 1.1142 27397928
0.2091 0.6725 510 1.1148 27670712
0.3143 0.6791 515 1.1149 27933056
0.1672 0.6857 520 1.1152 28201952
0.3181 0.6923 525 1.1164 28477464
0.1914 0.6989 530 1.1174 28743664
0.2931 0.7055 535 1.1155 29016592
0.2285 0.7120 540 1.1133 29283872
0.2749 0.7186 545 1.1163 29554240
0.2901 0.7252 550 1.1145 29821128
0.2361 0.7318 555 1.1114 30095352
0.2654 0.7384 560 1.1125 30371160
0.1935 0.7450 565 1.1129 30645928
0.268 0.7516 570 1.1101 30919376
0.1795 0.7582 575 1.1139 31186848
0.2439 0.7648 580 1.1122 31459480
0.259 0.7714 585 1.1091 31733560
0.248 0.7780 590 1.1105 32003016
0.2186 0.7846 595 1.1106 32278448
0.1595 0.7912 600 1.1115 32538192
0.2058 0.7978 605 1.1117 32816064
0.2324 0.8044 610 1.1095 33087144
0.2045 0.8109 615 1.1094 33353000
0.2333 0.8175 620 1.1095 33621888
0.2159 0.8241 625 1.1076 33888104
0.2866 0.8307 630 1.1094 34159240
0.2268 0.8373 635 1.1101 34430064
0.1753 0.8439 640 1.1100 34700128
0.2076 0.8505 645 1.1089 34968768
0.1912 0.8571 650 1.1069 35250136
0.1534 0.8637 655 1.1074 35524024
0.1424 0.8703 660 1.1083 35789520
0.2325 0.8769 665 1.1076 36067376
0.2607 0.8835 670 1.1046 36340512
0.234 0.8901 675 1.1048 36603160
0.232 0.8967 680 1.1081 36872480
0.2998 0.9032 685 1.1080 37146736
0.1921 0.9098 690 1.1045 37414776
0.2492 0.9164 695 1.1060 37685600
0.27 0.9230 700 1.1068 37949648
0.2159 0.9296 705 1.1046 38226312
0.1912 0.9362 710 1.1062 38502072
0.23 0.9428 715 1.1076 38772744
0.3387 0.9494 720 1.1054 39041632
0.23 0.9560 725 1.1051 39313560
0.2785 0.9626 730 1.1065 39585992
0.2116 0.9692 735 1.1030 39856632
0.2378 0.9758 740 1.1040 40120176
0.2006 0.9824 745 1.1046 40392064
0.2418 0.9890 750 1.1024 40664776
0.2041 0.9955 755 1.1028 40931592

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd2

Base model

google/gemma-2-2b
Finetuned
(437)
this model