Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter19_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2160
  • Num Input Tokens Seen: 4969888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.3282 0.0529 5 1.2782 268552
1.0606 0.1058 10 1.2285 533864
0.9673 0.1587 15 1.2222 799192
0.7577 0.2116 20 1.2580 1065712
0.7055 0.2646 25 1.2578 1334136
0.6601 0.3175 30 1.2654 1600744
0.5988 0.3704 35 1.2742 1865248
0.5391 0.4233 40 1.2674 2126184
0.5215 0.4762 45 1.2479 2389800
0.4847 0.5291 50 1.2539 2652896
0.3997 0.5820 55 1.2492 2917336
0.4981 0.6349 60 1.2381 3182592
0.422 0.6878 65 1.2312 3444800
0.4256 0.7407 70 1.2293 3706456
0.3611 0.7937 75 1.2366 3968992
0.4669 0.8466 80 1.2204 4236704
0.3871 0.8995 85 1.2243 4494952
0.4819 0.9524 90 1.2215 4752080

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter19_sftsd2

Base model

google/gemma-2-2b
Finetuned
(375)
this model