Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter5_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1596
  • Num Input Tokens Seen: 5092792

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.396 0.0535 5 1.2688 272480
1.1748 0.1070 10 1.1838 547328
1.12 0.1605 15 1.1718 821600
1.0808 0.2140 20 1.1633 1094936
0.8774 0.2676 25 1.1724 1374424
0.7922 0.3211 30 1.1880 1646160
0.8423 0.3746 35 1.1878 1922560
0.7508 0.4281 40 1.1777 2201456
0.6903 0.4816 45 1.1815 2480912
0.6497 0.5351 50 1.1695 2756496
0.6544 0.5886 55 1.1748 3029896
0.5257 0.6421 60 1.1744 3298912
0.607 0.6957 65 1.1630 3580736
0.5229 0.7492 70 1.1702 3850984
0.4844 0.8027 75 1.1632 4119888
0.5335 0.8562 80 1.1608 4389384
0.539 0.9097 85 1.1614 4662512
0.6048 0.9632 90 1.1580 4929312

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter5_sftsd1

Base model

google/gemma-2-2b
Finetuned
(437)
this model