Edit model card

results__fullrun__2110-104610

This model is a fine-tuned version of google/paligemma-3b-mix-448 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.2421

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_steps: 2
  • num_epochs: 20
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.6855 0.9952 180 2.6276
2.4044 1.9959 361 2.4716
2.2275 2.9965 542 2.4070
2.092 3.9972 723 2.3871
1.9761 4.9979 904 2.3929
1.8674 5.9986 1085 2.4194
1.7501 6.9993 1266 2.4726
1.6706 8.0 1447 2.5062
1.5599 8.9952 1627 2.5492
1.4896 9.9959 1808 2.6080
1.4289 10.9965 1989 2.6687
1.3458 11.9972 2170 2.7300
1.2746 12.9979 2351 2.7933
1.2656 13.9986 2532 2.8295
1.1751 14.9993 2713 2.9203
1.1792 16.0 2894 2.9811
1.0851 16.9952 3074 3.0481
1.0966 17.9959 3255 3.0981
1.0581 18.9965 3436 3.1394
1.0055 19.9032 3600 3.2421

Framework versions

  • PEFT 0.13.0
  • Transformers 4.45.1
  • Pytorch 2.3.0.post101
  • Datasets 2.19.1
  • Tokenizers 0.20.0
Downloads last month
5
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for adishourya/results__fullrun__2110-104610

Adapter
(20)
this model