Edit model card

collapse_gemma-2-2b_hs2_replace_iter14_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5786
  • Num Input Tokens Seen: 4691896

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5832 0.0511 5 1.2796 241616
0.8587 0.1021 10 1.3016 479616
0.4185 0.1532 15 1.5403 721488
0.2308 0.2042 20 1.7468 964136
0.1045 0.2553 25 2.0213 1208880
0.0619 0.3063 30 2.2027 1454104
0.0318 0.3574 35 2.3840 1702688
0.0249 0.4084 40 2.4977 1942392
0.0229 0.4595 45 2.5368 2183280
0.0206 0.5105 50 2.5589 2426192
0.0223 0.5616 55 2.5742 2665256
0.0204 0.6126 60 2.5825 2909424
0.0209 0.6637 65 2.5771 3148624
0.0203 0.7147 70 2.5744 3384112
0.02 0.7658 75 2.5874 3631480
0.0222 0.8168 80 2.5799 3869320
0.0208 0.8679 85 2.5673 4113768
0.0216 0.9190 90 2.5709 4346680
0.0211 0.9700 95 2.5779 4595608

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter14_sftsd2

Base model

google/gemma-2-2b
Finetuned
(375)
this model