palige_original_lora_32_epo_12

This model is a fine-tuned version of google/paligemma-3b-pt-224 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 10
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 40
optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
num_epochs: 12

Training Loss	Epoch	Step	Validation Loss
4.3289	0.3125	100	2.5898
2.3882	0.625	200	1.7361
1.9019	0.9375	300	1.3873
1.6578	1.25	400	1.2264
1.5708	1.5625	500	1.1162
1.4543	1.875	600	1.0530
1.3442	2.1875	700	0.9998
1.3264	2.5	800	0.9652
1.2714	2.8125	900	0.9281
1.1041	3.125	1000	0.9080
1.1605	3.4375	1100	0.8901
1.16	3.75	1200	0.8842
1.0577	4.0625	1300	0.8620
1.0074	4.375	1400	0.8497
1.0002	4.6875	1500	0.8355
0.9938	5.0	1600	0.8245
0.8815	5.3125	1700	0.8342
0.8922	5.625	1800	0.8133
0.901	5.9375	1900	0.8153
0.8109	6.25	2000	0.8217
0.8084	6.5625	2100	0.8126
0.8453	6.875	2200	0.8100
0.736	7.1875	2300	0.8091
0.732	7.5	2400	0.8186
0.7008	7.8125	2500	0.8007
0.6708	8.125	2600	0.8148
0.6406	8.4375	2700	0.8274
0.6541	8.75	2800	0.8215
0.6345	9.0625	2900	0.8511
0.5534	9.375	3000	0.8355
0.5553	9.6875	3100	0.8398
0.57	10.0	3200	0.8397
0.499	10.3125	3300	0.8666
0.4909	10.625	3400	0.8768
0.5028	10.9375	3500	0.8628
0.4397	11.25	3600	0.9102
0.4378	11.5625	3700	0.8732
0.4248	11.875	3800	0.8822