Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter19_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2055
  • Num Input Tokens Seen: 4907024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3427 0.0527 5 1.2782 258072
1.0971 0.1053 10 1.2131 521696
0.9209 0.1580 15 1.2167 782872
0.7304 0.2107 20 1.2697 1039040
0.6214 0.2633 25 1.2589 1307632
0.5449 0.3160 30 1.3018 1568000
0.521 0.3687 35 1.2918 1824608
0.4267 0.4213 40 1.2783 2087280
0.4484 0.4740 45 1.2457 2348744
0.403 0.5267 50 1.2346 2610176
0.3899 0.5793 55 1.2224 2873528
0.3705 0.6320 60 1.2227 3133328
0.3662 0.6847 65 1.2187 3395112
0.3322 0.7373 70 1.2076 3656104
0.3614 0.7900 75 1.2070 3917544
0.3462 0.8427 80 1.2021 4174120
0.3258 0.8953 85 1.2061 4437136
0.3069 0.9480 90 1.2061 4699512

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter19_sftsd1

Base model

google/gemma-2-2b
Finetuned
(375)
this model