RylanSchaeffer's picture
End of training
d175a5a verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter17_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter17_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5467
  • Num Input Tokens Seen: 4476864

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5537 0.0511 5 1.2778 222904
0.7952 0.1022 10 1.2852 448760
0.5611 0.1533 15 1.5050 675280
0.2536 0.2043 20 1.6999 905104
0.0968 0.2554 25 1.9531 1131936
0.1019 0.3065 30 2.1577 1361488
0.0495 0.3576 35 2.2738 1602552
0.0367 0.4087 40 2.3801 1837384
0.0268 0.4598 45 2.4756 2067480
0.0231 0.5109 50 2.5173 2301720
0.0267 0.5619 55 2.5333 2537344
0.0236 0.6130 60 2.5332 2774360
0.024 0.6641 65 2.5395 3004064
0.0218 0.7152 70 2.5456 3227784
0.0214 0.7663 75 2.5475 3463792
0.0219 0.8174 80 2.5494 3696792
0.0242 0.8685 85 2.5434 3925432
0.024 0.9195 90 2.5408 4158728
0.0228 0.9706 95 2.5444 4384712

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1