RylanSchaeffer's picture
End of training
b25c047 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter14_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter14_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5786
  • Num Input Tokens Seen: 4691896

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5832 0.0511 5 1.2796 241616
0.8587 0.1021 10 1.3016 479616
0.4185 0.1532 15 1.5403 721488
0.2308 0.2042 20 1.7468 964136
0.1045 0.2553 25 2.0213 1208880
0.0619 0.3063 30 2.2027 1454104
0.0318 0.3574 35 2.3840 1702688
0.0249 0.4084 40 2.4977 1942392
0.0229 0.4595 45 2.5368 2183280
0.0206 0.5105 50 2.5589 2426192
0.0223 0.5616 55 2.5742 2665256
0.0204 0.6126 60 2.5825 2909424
0.0209 0.6637 65 2.5771 3148624
0.0203 0.7147 70 2.5744 3384112
0.02 0.7658 75 2.5874 3631480
0.0222 0.8168 80 2.5799 3869320
0.0208 0.8679 85 2.5673 4113768
0.0216 0.9190 90 2.5709 4346680
0.0211 0.9700 95 2.5779 4595608

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1