Edit model card

llama3.2-1B-dpo-v1

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5037
  • Rewards/chosen: -1.5976
  • Rewards/rejected: -4.4612
  • Rewards/accuracies: 0.7913
  • Rewards/margins: 2.8637
  • Logps/rejected: -449.0548
  • Logps/chosen: -492.4157
  • Logits/rejected: -0.4987
  • Logits/chosen: -0.2456

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 6
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5478 0.0763 700 0.5414 -0.1038 -1.2895 0.7302 1.1857 -417.3377 -477.4783 0.2274 0.3639
0.5243 0.1527 1400 0.6105 -1.3917 -3.5270 0.7313 2.1353 -439.7127 -490.3575 0.3525 0.5136
0.6483 0.2290 2100 0.6370 -3.1503 -5.7506 0.7482 2.6003 -461.9483 -507.9432 0.2785 0.4243
0.687 0.3053 2800 0.5835 -0.2196 -2.3802 0.7391 2.1606 -428.2447 -478.6364 0.3201 0.4561
0.5813 0.3816 3500 0.5808 -0.6116 -3.0983 0.7609 2.4868 -435.4256 -482.5557 0.0172 0.2140
0.7066 0.4580 4200 0.5681 -1.1058 -3.4796 0.7564 2.3738 -439.2385 -487.4986 0.0611 0.2653
0.6408 0.5343 4900 0.5910 -0.7319 -3.5281 0.7659 2.7962 -439.7232 -483.7594 -0.0603 0.1582
0.4565 0.6106 5600 0.5367 -1.0688 -3.9321 0.7867 2.8633 -443.7639 -487.1283 -0.1924 0.0583
0.5482 0.6869 6300 0.5267 -1.4234 -4.2466 0.7888 2.8232 -446.9083 -490.6742 -0.4528 -0.2006
0.5196 0.7633 7000 0.5322 -2.2017 -5.1279 0.7888 2.9261 -455.7211 -498.4576 -0.6046 -0.3654
0.4858 0.8396 7700 0.5116 -2.0986 -5.0640 0.7938 2.9653 -455.0820 -497.4264 -0.5200 -0.2768
0.4581 0.9159 8400 0.5051 -1.6669 -4.5613 0.7913 2.8944 -450.0557 -493.1090 -0.5097 -0.2589
0.3934 0.9923 9100 0.5037 -1.5976 -4.4612 0.7913 2.8637 -449.0548 -492.4157 -0.4987 -0.2456

Framework versions

  • PEFT 0.8.2
  • Transformers 4.45.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.20.0
Downloads last month
22
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for BBexist/llama3.2-1B-dpo-v1

Adapter
(8)
this model