Edit model card

collapse_gemma-2-2b_hs2_replace_iter7_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5502
  • Num Input Tokens Seen: 4945256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4075 0.0511 5 1.2768 262544
0.9592 0.1021 10 1.2490 516672
0.6262 0.1532 15 1.3861 768768
0.4505 0.2042 20 1.5683 1029088
0.2604 0.2553 25 1.7472 1282984
0.1477 0.3063 30 1.9911 1535608
0.072 0.3574 35 2.1880 1790128
0.0485 0.4084 40 2.3094 2042896
0.0376 0.4595 45 2.4429 2298696
0.0293 0.5105 50 2.4744 2551432
0.0301 0.5616 55 2.4918 2814520
0.0241 0.6126 60 2.4981 3062952
0.0233 0.6637 65 2.5132 3321896
0.0241 0.7147 70 2.5177 3582960
0.022 0.7658 75 2.5261 3830336
0.0216 0.8168 80 2.5296 4090456
0.0233 0.8679 85 2.5449 4344200
0.0224 0.9190 90 2.5459 4595224
0.0242 0.9700 95 2.5496 4844232

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter7_sftsd1

Base model

google/gemma-2-2b
Finetuned
(378)
this model