Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1717
  • Num Input Tokens Seen: 5151488

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4482 0.0535 5 1.2668 271872
1.1802 0.1070 10 1.1813 551352
1.1473 0.1605 15 1.1598 829440
1.0619 0.2140 20 1.1584 1104288
1.1176 0.2676 25 1.1511 1383600
0.93 0.3211 30 1.1794 1664448
0.8449 0.3746 35 1.1762 1942672
0.7169 0.4281 40 1.1889 2215216
0.829 0.4816 45 1.1865 2492600
0.7906 0.5351 50 1.1964 2777464
0.7563 0.5886 55 1.1843 3047688
0.5811 0.6421 60 1.1744 3329008
0.5871 0.6957 65 1.1818 3602304
0.6481 0.7492 70 1.1738 3877360
0.6768 0.8027 75 1.1757 4156296
0.5548 0.8562 80 1.1752 4435368
0.5679 0.9097 85 1.1694 4707808
0.6058 0.9632 90 1.1782 4986944

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter5_sftsd2

Base model

google/gemma-2-2b
Finetuned
(437)
this model