Edit model card

collapse_gemma-2-2b_hs2_replace_iter9_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6202
  • Num Input Tokens Seen: 4782384

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4251 0.0511 5 1.2773 251936
1.0412 0.1021 10 1.2473 494776
0.6073 0.1532 15 1.4155 739080
0.4079 0.2042 20 1.6045 992504
0.2094 0.2553 25 1.8387 1237192
0.1016 0.3063 30 2.1297 1481728
0.0516 0.3574 35 2.2672 1730840
0.0373 0.4084 40 2.3948 1976944
0.0293 0.4595 45 2.4808 2220288
0.0264 0.5105 50 2.5189 2467184
0.0285 0.5616 55 2.5581 2721304
0.0236 0.6126 60 2.5681 2961768
0.0228 0.6637 65 2.5784 3208208
0.0235 0.7147 70 2.5833 3462120
0.0239 0.7658 75 2.5890 3702984
0.023 0.8168 80 2.6044 3955448
0.0233 0.8679 85 2.6159 4205584
0.0226 0.9190 90 2.6276 4445256
0.0246 0.9700 95 2.6255 4684392

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter9_sftsd1

Base model

google/gemma-2-2b
Finetuned
(378)
this model