metadata

license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter12_sftsd2
    results: []

collapse_gemma-2-2b_hs2_replace_iter12_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.5455
Num Input Tokens Seen: 4776248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.576	0.0511	5	1.2789	245968
0.8863	0.1022	10	1.2945	492152
0.425	0.1533	15	1.5320	737520
0.2379	0.2043	20	1.7713	982160
0.0907	0.2554	25	1.9824	1224512
0.0414	0.3065	30	2.2019	1470552
0.0276	0.3576	35	2.3755	1714072
0.0296	0.4087	40	2.4664	1966528
0.0225	0.4598	45	2.4860	2212000
0.0211	0.5109	50	2.5160	2456616
0.022	0.5619	55	2.5310	2704328
0.0214	0.6130	60	2.5350	2960528
0.0205	0.6641	65	2.5329	3202432
0.021	0.7152	70	2.5373	3446344
0.0201	0.7663	75	2.5473	3689160
0.0216	0.8174	80	2.5416	3935112
0.0194	0.8685	85	2.5472	4182704
0.0209	0.9195	90	2.5474	4433736
0.0213	0.9706	95	2.5460	4680728

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1