Edit model card

collapse_gemma-2-2b_hs2_replace_iter9_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5305
  • Num Input Tokens Seen: 4805008

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.5231 0.0513 5 1.2790 249840
0.9028 0.1027 10 1.2972 494032
0.5406 0.1540 15 1.5467 747896
0.2367 0.2054 20 1.8042 994456
0.1891 0.2567 25 1.9924 1238888
0.0891 0.3081 30 2.1582 1483192
0.0613 0.3594 35 2.3303 1726032
0.0361 0.4108 40 2.4317 1973864
0.0255 0.4621 45 2.4696 2224064
0.0251 0.5135 50 2.5037 2481064
0.0244 0.5648 55 2.5279 2724856
0.0234 0.6162 60 2.5367 2979392
0.0255 0.6675 65 2.5210 3223656
0.0291 0.7189 70 2.5165 3468936
0.0237 0.7702 75 2.4977 3711296
0.0233 0.8216 80 2.4937 3960920
0.0217 0.8729 85 2.5052 4202464
0.0228 0.9243 90 2.5141 4452272
0.0221 0.9756 95 2.5258 4700624

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter9_sftsd0

Base model

google/gemma-2-2b
Finetuned
(378)
this model