Edit model card

collapse_gemma-2-2b_hs2_replace_iter7_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5953
  • Num Input Tokens Seen: 7903776

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.682 0.0315 5 1.3074 243600
1.1958 0.0630 10 1.2384 489576
0.8251 0.0945 15 1.3304 736640
0.5381 0.1260 20 1.4985 987872
0.2729 0.1575 25 1.6796 1238504
0.2581 0.1890 30 1.8457 1493072
0.1176 0.2205 35 2.0106 1744312
0.0711 0.2520 40 2.1725 1992528
0.05 0.2835 45 2.2581 2243824
0.0473 0.3150 50 2.3984 2490888
0.0535 0.3465 55 2.4441 2740728
0.032 0.3780 60 2.4463 2979648
0.0318 0.4094 65 2.4594 3231056
0.0359 0.4409 70 2.4814 3481768
0.0294 0.4724 75 2.5039 3739344
0.0275 0.5039 80 2.4899 3999888
0.0298 0.5354 85 2.4773 4250720
0.0296 0.5669 90 2.5022 4506360
0.0243 0.5984 95 2.5058 4764496
0.027 0.6299 100 2.5154 5009024
0.0268 0.6614 105 2.5056 5257688
0.0292 0.6929 110 2.5422 5501784
0.0297 0.7244 115 2.5510 5757400
0.0266 0.7559 120 2.5546 6003016
0.0255 0.7874 125 2.5727 6255120
0.0258 0.8189 130 2.5746 6501384
0.0282 0.8504 135 2.5777 6752008
0.0303 0.8819 140 2.5688 7004584
0.0259 0.9134 145 2.5679 7258288
0.0244 0.9449 150 2.5847 7508928
0.024 0.9764 155 2.5936 7755344

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter7_sftsd0

Base model

google/gemma-2-2b
Finetuned
(447)
this model