Edit model card

collapse_gemma-2-2b_hs2_replace_iter13_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5914
  • Num Input Tokens Seen: 4764832

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6665 0.0511 5 1.2782 248088
0.7132 0.1021 10 1.3229 492944
0.4332 0.1532 15 1.5331 737200
0.2622 0.2042 20 1.7169 982128
0.136 0.2553 25 1.9671 1228256
0.077 0.3063 30 2.1798 1475016
0.0375 0.3574 35 2.3843 1718752
0.0255 0.4084 40 2.5202 1966432
0.0209 0.4595 45 2.5795 2208784
0.0194 0.5105 50 2.5974 2459208
0.0199 0.5616 55 2.6020 2700064
0.0211 0.6126 60 2.6116 2947288
0.0206 0.6637 65 2.6139 3192944
0.02 0.7147 70 2.6100 3432568
0.0204 0.7658 75 2.5829 3677032
0.0213 0.8168 80 2.5711 3922712
0.0209 0.8679 85 2.5732 4172608
0.0192 0.9190 90 2.5755 4418512
0.0197 0.9700 95 2.5900 4665416

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter13_sftsd2

Base model

google/gemma-2-2b
Finetuned
(437)
this model