jkazdan's picture
End of training
9cddcd6 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter6_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter6_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5403
  • Num Input Tokens Seen: 8006024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6849 0.0315 5 1.3074 251000
1.118 0.0630 10 1.2395 501688
0.7452 0.0945 15 1.3250 754952
0.6083 0.1259 20 1.4758 1008248
0.3921 0.1574 25 1.6064 1265824
0.259 0.1889 30 1.8043 1526472
0.1544 0.2204 35 1.9984 1787752
0.0991 0.2519 40 2.1525 2040120
0.0448 0.2834 45 2.2692 2285400
0.0467 0.3148 50 2.3345 2536824
0.0435 0.3463 55 2.4326 2790040
0.033 0.3778 60 2.5077 3046656
0.031 0.4093 65 2.5876 3295608
0.0311 0.4408 70 2.5704 3545992
0.0287 0.4723 75 2.5464 3802920
0.0257 0.5037 80 2.5635 4056400
0.0303 0.5352 85 2.5473 4310104
0.0252 0.5667 90 2.5338 4566456
0.0271 0.5982 95 2.5463 4822016
0.0269 0.6297 100 2.5515 5074048
0.0264 0.6612 105 2.5565 5332864
0.0272 0.6926 110 2.5661 5586528
0.025 0.7241 115 2.5334 5839624
0.0264 0.7556 120 2.5193 6095336
0.0252 0.7871 125 2.5051 6352376
0.0243 0.8186 130 2.5119 6603584
0.0281 0.8501 135 2.5157 6852952
0.0261 0.8815 140 2.5087 7101856
0.0255 0.9130 145 2.5109 7353304
0.0253 0.9445 150 2.5341 7598048
0.0262 0.9760 155 2.5437 7851000

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1