Yi-6b-200k-dpo / README.md
chinoll's picture
add model
cd3b55e
|
raw
history blame
7.54 kB
metadata
base_model: ./data/zephyr-7b-sft-full
tags:
  - alignment-handbook
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of ./data/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5105
  • Rewards/chosen: -1.7322
  • Rewards/rejected: -3.3299
  • Rewards/accuracies: 0.7619
  • Rewards/margins: 1.5977
  • Logps/rejected: -315.2173
  • Logps/chosen: -359.3560
  • Logits/rejected: -0.7333
  • Logits/chosen: -0.7199

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6517 0.1 100 0.6389 -0.0070 -0.1621 0.6905 0.1551 -283.5396 -342.1045 -0.5321 -0.4793
0.5605 0.21 200 0.5619 -0.0146 -0.6024 0.7381 0.5879 -287.9430 -342.1800 -0.5264 -0.4852
0.5581 0.31 300 0.5333 -0.0290 -0.8509 0.7540 0.8219 -290.4272 -342.3241 -0.5108 -0.4742
0.5467 0.41 400 0.5165 -0.1986 -1.1136 0.7698 0.9150 -293.0540 -344.0201 -0.5404 -0.5044
0.5223 0.52 500 0.5120 -0.1374 -1.1105 0.7659 0.9730 -293.0233 -343.4084 -0.5315 -0.4944
0.5265 0.62 600 0.5085 -0.2099 -1.2965 0.7698 1.0866 -294.8834 -344.1335 -0.5350 -0.4980
0.5342 0.72 700 0.4961 -0.1152 -1.1322 0.7738 1.0170 -293.2408 -343.1862 -0.5509 -0.5124
0.48 0.83 800 0.4913 -0.1837 -1.1984 0.7619 1.0148 -293.9029 -343.8708 -0.5183 -0.4760
0.517 0.93 900 0.4865 -0.1696 -1.2078 0.7659 1.0382 -293.9965 -343.7298 -0.5289 -0.4854
0.477 1.03 1000 0.4905 -0.1084 -1.2175 0.7619 1.1090 -294.0931 -343.1185 -0.5469 -0.5062
0.4033 1.14 1100 0.4870 -0.1598 -1.2266 0.7540 1.0668 -294.1847 -343.6326 -0.5547 -0.5138
0.3284 1.24 1200 0.4836 -0.3432 -1.5002 0.7817 1.1570 -296.9207 -345.4664 -0.5812 -0.5440
0.2574 1.34 1300 0.4861 -0.5667 -1.8467 0.7738 1.2801 -300.3859 -347.7009 -0.5840 -0.5523
0.2641 1.44 1400 0.4897 -0.6824 -1.9954 0.7698 1.3129 -301.8724 -348.8586 -0.6308 -0.6034
0.2424 1.55 1500 0.5010 -0.8646 -2.2932 0.7540 1.4286 -304.8503 -350.6802 -0.6025 -0.5800
0.2944 1.65 1600 0.4927 -0.7608 -2.1089 0.7659 1.3480 -303.0073 -349.6426 -0.6171 -0.5909
0.2958 1.75 1700 0.4913 -0.8080 -2.1126 0.7698 1.3046 -303.0449 -350.1146 -0.6429 -0.6156
0.2667 1.86 1800 0.4877 -0.9185 -2.2364 0.7619 1.3178 -304.2823 -351.2196 -0.6212 -0.5936
0.2494 1.96 1900 0.4853 -0.8965 -2.2705 0.75 1.3740 -304.6238 -350.9996 -0.6262 -0.6005
0.2631 2.06 2000 0.4869 -0.7974 -2.1804 0.7698 1.3830 -303.7225 -350.0081 -0.6231 -0.5974
0.1965 2.17 2100 0.4886 -1.0005 -2.3981 0.7540 1.3977 -305.8999 -352.0387 -0.6557 -0.6330
0.1711 2.27 2200 0.4910 -1.1688 -2.6422 0.7778 1.4734 -308.3400 -353.7221 -0.6689 -0.6486
0.1492 2.37 2300 0.5077 -1.4306 -3.0185 0.7778 1.5878 -312.1035 -356.3406 -0.7016 -0.6846
0.1448 2.48 2400 0.5113 -1.6343 -3.3087 0.7659 1.6744 -315.0052 -358.3771 -0.7281 -0.7164
0.1425 2.58 2500 0.5185 -1.6767 -3.4070 0.7698 1.7304 -315.9888 -358.8008 -0.7207 -0.7101
0.1661 2.68 2600 0.5144 -1.6680 -3.3881 0.7659 1.7201 -315.7997 -358.7144 -0.7288 -0.7184
0.1755 2.79 2700 0.5153 -1.7546 -3.3676 0.7619 1.6130 -315.5941 -359.5799 -0.7388 -0.7261
0.1677 2.89 2800 0.5120 -1.7415 -3.3279 0.7540 1.5863 -315.1972 -359.4494 -0.7350 -0.7219
0.1711 2.99 2900 0.5120 -1.7362 -3.3282 0.7619 1.5920 -315.2005 -359.3962 -0.7329 -0.7195

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.1
  • Datasets 2.14.7
  • Tokenizers 0.14.1