Edit model card

collapse_gemma-2-2b_hs2_replace_iter2_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5087
  • Num Input Tokens Seen: 7865232

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.7513 0.0345 5 1.3040 274624
1.4363 0.0690 10 1.1967 546200
1.2021 0.1035 15 1.1697 817584
1.0165 0.1380 20 1.1870 1088176
0.9588 0.1725 25 1.2338 1358336
0.856 0.2070 30 1.3377 1637992
0.6142 0.2415 35 1.3806 1911376
0.5705 0.2760 40 1.4600 2181176
0.5098 0.3105 45 1.5034 2462856
0.3225 0.3450 50 1.5081 2737752
0.3129 0.3795 55 1.5481 3012656
0.3444 0.4140 60 1.4783 3279744
0.2324 0.4485 65 1.4703 3547808
0.234 0.4830 70 1.4699 3817328
0.2621 0.5175 75 1.4305 4097184
0.1199 0.5520 80 1.4580 4367848
0.1915 0.5865 85 1.4274 4640592
0.2214 0.6210 90 1.4877 4922032
0.1506 0.6555 95 1.4413 5193088
0.1584 0.6900 100 1.4564 5464864
0.2169 0.7245 105 1.4504 5739032
0.1219 0.7589 110 1.4286 6012736
0.1687 0.7934 115 1.4840 6274808
0.1776 0.8279 120 1.4578 6548312
0.1197 0.8624 125 1.4703 6821112
0.1035 0.8969 130 1.4563 7098736
0.1298 0.9314 135 1.4510 7369552
0.0958 0.9659 140 1.4814 7640632

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter2_sftsd2

Base model

google/gemma-2-2b
Finetuned
(454)
this model