Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter15_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1998
  • Num Input Tokens Seen: 4936552

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3791 0.0528 5 1.2741 270544
1.0969 0.1057 10 1.2081 532816
0.9217 0.1585 15 1.2099 798280
0.794 0.2114 20 1.2350 1062080
0.6797 0.2642 25 1.2627 1325448
0.5979 0.3170 30 1.2847 1587720
0.5021 0.3699 35 1.2607 1843760
0.476 0.4227 40 1.2628 2107456
0.487 0.4756 45 1.2493 2370112
0.3335 0.5284 50 1.2433 2630520
0.2871 0.5812 55 1.2330 2893392
0.4017 0.6341 60 1.2188 3156472
0.4512 0.6869 65 1.2056 3422672
0.3667 0.7398 70 1.2041 3688952
0.366 0.7926 75 1.2002 3944200
0.4095 0.8454 80 1.2073 4204976
0.2232 0.8983 85 1.2062 4464672
0.3349 0.9511 90 1.1993 4725696

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter15_sftsd1

Base model

google/gemma-2-2b
Finetuned
(437)
this model