Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0989
  • Num Input Tokens Seen: 13720456

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.5681 0.0206 5 1.3561 281208
1.3762 0.0412 10 1.2366 561408
1.2447 0.0618 15 1.1711 846520
1.2027 0.0824 20 1.1444 1123552
1.3092 0.1030 25 1.1172 1406248
1.1897 0.1236 30 1.1186 1693528
1.0685 0.1441 35 1.1195 1979856
0.9925 0.1647 40 1.1286 2262504
1.0026 0.1853 45 1.1277 2544760
0.9181 0.2059 50 1.1374 2825680
0.9007 0.2265 55 1.1411 3103440
0.8626 0.2471 60 1.1421 3388104
0.8576 0.2677 65 1.1406 3668152
0.9025 0.2883 70 1.1459 3947528
0.8566 0.3089 75 1.1449 4229392
0.8071 0.3295 80 1.1467 4514912
0.7788 0.3501 85 1.1398 4800168
0.7999 0.3707 90 1.1427 5085472
0.7548 0.3912 95 1.1401 5370096
0.7775 0.4118 100 1.1324 5654448
0.6659 0.4324 105 1.1390 5932488
0.7151 0.4530 110 1.1345 6217432
0.7126 0.4736 115 1.1303 6504472
0.5812 0.4942 120 1.1395 6786136
0.7462 0.5148 125 1.1331 7075544
0.6824 0.5354 130 1.1306 7349632
0.7777 0.5560 135 1.1333 7638056
0.614 0.5766 140 1.1285 7926232
0.6151 0.5972 145 1.1264 8206848
0.7309 0.6178 150 1.1235 8494256
0.6219 0.6384 155 1.1226 8771192
0.6518 0.6589 160 1.1194 9060384
0.6101 0.6795 165 1.1167 9344632
0.6374 0.7001 170 1.1139 9625824
0.6431 0.7207 175 1.1153 9909464
0.6351 0.7413 180 1.1112 10193712
0.6205 0.7619 185 1.1099 10473824
0.5593 0.7825 190 1.1086 10757760
0.6611 0.8031 195 1.1067 11044304
0.604 0.8237 200 1.1089 11335648
0.5985 0.8443 205 1.1045 11616672
0.6425 0.8649 210 1.1041 11904256
0.6244 0.8855 215 1.1036 12186800
0.4801 0.9060 220 1.1015 12472520
0.5418 0.9266 225 1.1026 12757120
0.5693 0.9472 230 1.0992 13037120
0.6361 0.9678 235 1.0997 13321752
0.5677 0.9884 240 1.0984 13608048

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter2_sftsd2

Base model

google/gemma-2-2b
Finetuned
(454)
this model