Edit model card

Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V4

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4478
  • Rewards/chosen: -2.1727
  • Rewards/rejected: -3.1141
  • Rewards/accuracies: 0.75
  • Rewards/margins: 0.9413
  • Logps/rejected: -108.1851
  • Logps/chosen: -106.1621
  • Logits/rejected: -0.0537
  • Logits/chosen: -0.0626

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7058 0.3016 76 0.6991 0.0563 0.0630 0.5833 -0.0067 -76.4141 -83.8712 0.4583 0.4571
0.7778 0.6032 152 0.6695 -0.3797 -0.4650 0.5833 0.0853 -81.6941 -88.2312 0.4087 0.4065
1.1444 0.9048 228 0.6343 -0.5747 -0.8035 0.6667 0.2288 -85.0798 -90.1819 0.3921 0.3901
0.3356 1.2063 304 0.5906 -0.6785 -1.0749 0.75 0.3964 -87.7931 -91.2192 0.3726 0.3707
0.2763 1.5079 380 0.5523 -1.2776 -1.8289 0.6667 0.5513 -95.3333 -97.2104 0.2582 0.2534
0.3627 1.8095 456 0.6087 -0.9428 -1.2234 0.6667 0.2806 -89.2781 -93.8623 0.2429 0.2369
0.2197 2.1111 532 0.4800 -1.5304 -2.2029 0.75 0.6724 -99.0731 -99.7390 0.0887 0.0802
0.1679 2.4127 608 0.4563 -2.1014 -2.9919 0.6667 0.8905 -106.9635 -105.4488 -0.0385 -0.0475
0.2841 2.7143 684 0.4478 -2.1727 -3.1141 0.75 0.9413 -108.1851 -106.1621 -0.0537 -0.0626

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead-0_TTree1.4_TT0.9_TP0.7_TE0.2_V4

Adapter
(1059)
this model