Edit model card

Llama-2-7b-dpo-10k

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7215
  • Rewards/real: 5.3782
  • Rewards/generated: 4.9113
  • Rewards/accuracies: 0.6923
  • Rewards/margins: 0.4668
  • Logps/generated: -113.1980
  • Logps/real: -125.7774
  • Logits/generated: -1.1385
  • Logits/real: -1.0466

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/real Rewards/generated Rewards/accuracies Rewards/margins Logps/generated Logps/real Logits/generated Logits/real
0.8559 0.1984 62 0.8605 0.4128 0.4099 0.4808 0.0029 -158.2126 -175.4314 -0.8219 -0.6123
0.7999 0.3968 124 0.8323 1.5863 1.5154 0.5192 0.0709 -147.1573 -163.6966 -0.8057 -0.6067
0.7846 0.5952 186 0.7979 2.4470 2.3135 0.5577 0.1335 -139.1767 -155.0893 -0.8686 -0.6862
0.7916 0.7936 248 0.7819 3.0117 2.8464 0.6346 0.1653 -133.8475 -149.4422 -0.9049 -0.7322
0.7714 0.992 310 0.7630 3.4214 3.1941 0.6346 0.2273 -130.3704 -145.3455 -0.9511 -0.7905
0.678 1.1904 372 0.7552 3.9523 3.6931 0.6538 0.2592 -125.3802 -140.0360 -0.9800 -0.8279
0.6337 1.3888 434 0.7464 4.4541 4.1602 0.6346 0.2939 -120.7093 -135.0177 -1.0279 -0.8860
0.6575 1.5872 496 0.7352 4.8501 4.4918 0.6538 0.3583 -117.3935 -131.0585 -1.0562 -0.9285
0.6606 1.7856 558 0.7270 5.1119 4.7485 0.6538 0.3634 -114.8267 -128.4403 -1.0969 -0.9780
0.6319 1.984 620 0.7260 5.2581 4.8563 0.6538 0.4018 -113.7479 -126.9782 -1.0953 -0.9815
0.552 2.1824 682 0.7295 5.3469 4.9377 0.6731 0.4092 -112.9344 -126.0898 -1.1133 -1.0072
0.5541 2.3808 744 0.7229 5.4093 4.9819 0.6923 0.4274 -112.4924 -125.4664 -1.1322 -1.0330
0.5342 2.5792 806 0.7246 5.3967 4.9520 0.6923 0.4447 -112.7909 -125.5919 -1.1353 -1.0397
0.5318 2.7776 868 0.7229 5.3656 4.9040 0.6731 0.4615 -113.2710 -125.9033 -1.1367 -1.0427
0.5396 2.976 930 0.7215 5.3782 4.9113 0.6923 0.4668 -113.1980 -125.7774 -1.1385 -1.0466

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
33
Safetensors
Model size
6.74B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for AmberYifan/Llama-2-7b-dpo-10k

Finetuned
(590)
this model