Edit model card

Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V1.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9877
  • Rewards/chosen: -2.3284
  • Rewards/rejected: -2.4506
  • Rewards/accuracies: 0.5
  • Rewards/margins: 0.1222
  • Logps/rejected: -132.7494
  • Logps/chosen: -125.7301
  • Logits/rejected: -0.6876
  • Logits/chosen: -0.6598

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6798 0.3021 50 0.7124 -0.0006 0.0284 0.5 -0.0290 -107.9596 -102.4519 -0.1526 -0.1158
0.622 0.6042 100 0.6927 0.0459 0.0462 0.375 -0.0004 -107.7811 -101.9873 -0.1568 -0.1192
0.6031 0.9063 150 0.6815 0.0644 0.0476 0.375 0.0168 -107.7677 -101.8019 -0.2034 -0.1651
0.3606 1.2085 200 0.8146 -0.7471 -0.7226 0.625 -0.0245 -115.4695 -109.9166 -0.3703 -0.3356
0.3387 1.5106 250 0.6641 -0.3875 -0.5323 0.625 0.1448 -113.5663 -106.3212 -0.3313 -0.2957
0.1549 1.8127 300 0.6263 -0.8093 -1.0444 0.625 0.2351 -118.6870 -110.5388 -0.3892 -0.3537
0.0958 2.1148 350 0.7394 -1.7451 -2.0072 0.5 0.2621 -128.3158 -119.8970 -0.5348 -0.5043
0.0193 2.4169 400 0.9249 -2.0984 -2.2555 0.5 0.1571 -130.7979 -123.4299 -0.6495 -0.6219
0.3616 2.7190 450 0.9877 -2.3284 -2.4506 0.5 0.1222 -132.7494 -125.7301 -0.6876 -0.6598

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
1
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V1.0

Adapter
(1088)
this model