Edit model card

Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V4

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2125
  • Rewards/chosen: -3.3104
  • Rewards/rejected: -2.9319
  • Rewards/accuracies: 0.4167
  • Rewards/margins: -0.3786
  • Logps/rejected: -192.9225
  • Logps/chosen: -170.2794
  • Logits/rejected: 0.1199
  • Logits/chosen: 0.1595

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6179 0.3027 79 0.7115 -0.1031 -0.0593 0.25 -0.0438 -164.1966 -138.2057 0.5429 0.5748
0.6065 0.6054 158 0.7348 -0.0751 0.0129 0.25 -0.0879 -163.4753 -137.9259 0.5242 0.5565
0.621 0.9080 237 0.7932 -0.0433 0.1366 0.5 -0.1800 -162.2375 -137.6083 0.4932 0.5259
0.4714 1.2107 316 0.7928 -0.6963 -0.5927 0.5 -0.1037 -169.5308 -144.1387 0.4698 0.5037
0.3829 1.5134 395 0.8637 -1.6604 -1.5528 0.3333 -0.1075 -179.1323 -153.7787 0.3664 0.4026
0.3589 1.8161 474 0.9222 -1.4397 -1.1360 0.25 -0.3037 -174.9637 -151.5720 0.3400 0.3770
0.2138 2.1188 553 0.9860 -1.9991 -1.6486 0.3333 -0.3505 -180.0903 -157.1666 0.2605 0.2992
0.0437 2.4215 632 1.1781 -3.1628 -2.7961 0.4167 -0.3666 -191.5652 -168.8030 0.1441 0.1838
0.1667 2.7241 711 1.2125 -3.3104 -2.9319 0.4167 -0.3786 -192.9225 -170.2794 0.1199 0.1595

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.19.1
Downloads last month
34
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V4

Adapter
(1085)
this model