RylanSchaeffer's picture
End of training
4a5290b verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_accumulatesubsample_iter20_sftsd1
    results: []

collapse_gemma-2-2b_hs2_accumulatesubsample_iter20_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2128
  • Num Input Tokens Seen: 4956000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3061 0.0526 5 1.2755 257768
1.1139 0.1053 10 1.2161 516928
0.8992 0.1579 15 1.2251 774912
0.7783 0.2105 20 1.2543 1046504
0.6654 0.2632 25 1.2786 1304672
0.6199 0.3158 30 1.2854 1564472
0.5221 0.3684 35 1.2730 1825704
0.4487 0.4211 40 1.2795 2083416
0.467 0.4737 45 1.2633 2341304
0.4486 0.5263 50 1.2577 2609808
0.4169 0.5789 55 1.2187 2865536
0.3921 0.6316 60 1.2464 3125408
0.3376 0.6842 65 1.2217 3387088
0.3697 0.7368 70 1.2219 3650704
0.3067 0.7895 75 1.2148 3918312
0.3436 0.8421 80 1.2127 4176968
0.3345 0.8947 85 1.2084 4435856
0.3397 0.9474 90 1.2054 4698528
0.2657 1.0 95 1.2128 4956000

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1