Edit model card

Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V3

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2551
  • Rewards/chosen: -2.5518
  • Rewards/rejected: -2.2604
  • Rewards/accuracies: 0.25
  • Rewards/margins: -0.2914
  • Logps/rejected: -124.4866
  • Logps/chosen: -113.4918
  • Logits/rejected: -0.2003
  • Logits/chosen: -0.1939

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7369 0.3016 76 0.7203 0.0395 0.0671 0.5 -0.0275 -101.2117 -87.5782 0.3635 0.3676
0.7346 0.6032 152 0.7779 0.0272 0.0797 0.4167 -0.0525 -101.0857 -87.7017 0.3423 0.3459
0.5938 0.9048 228 0.7684 -0.0888 -0.0049 0.25 -0.0839 -101.9318 -88.8616 0.3591 0.3628
0.2822 1.2063 304 0.9053 -0.5756 -0.3819 0.5 -0.1937 -105.7012 -93.7292 0.3058 0.3097
0.1938 1.5079 380 0.9300 -0.8880 -0.7094 0.3333 -0.1786 -108.9764 -96.8538 0.2048 0.2097
0.6894 1.8095 456 1.0636 -1.7609 -1.5117 0.3333 -0.2492 -116.9998 -105.5827 0.0583 0.0644
0.2845 2.1111 532 0.9900 -1.5299 -1.4094 0.3333 -0.1206 -115.9760 -103.2727 -0.0017 0.0048
0.0617 2.4127 608 1.1950 -2.1986 -1.9633 0.25 -0.2353 -121.5159 -109.9597 -0.1517 -0.1451
0.1181 2.7143 684 1.2551 -2.5518 -2.2604 0.25 -0.2914 -124.4866 -113.4918 -0.2003 -0.1939

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.19.1
Downloads last month
8
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-5_TTree1.4_TT0.9_TP0.7_TE0.2_V3

Adapter
(1059)
this model