zephyr-7b-dpo-full
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.7337
- Rewards/chosen: -4.9100
- Rewards/rejected: -8.6806
- Rewards/accuracies: 0.7720
- Rewards/margins: 3.7705
- Logps/rejected: -315.2896
- Logps/chosen: -320.2513
- Logits/rejected: -2.5449
- Logits/chosen: -2.5953
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6144 | 0.05 | 100 | 0.5938 | 0.0567 | -0.2214 | 0.7220 | 0.2780 | -230.6976 | -270.5843 | -3.0045 | -3.0186 |
0.4957 | 0.1 | 200 | 0.5132 | 0.0606 | -0.7482 | 0.7460 | 0.8088 | -235.9661 | -270.5448 | -2.9556 | -2.9714 |
0.5257 | 0.15 | 300 | 0.4975 | -0.0361 | -1.0262 | 0.7520 | 0.9901 | -238.7455 | -271.5117 | -2.9853 | -2.9989 |
0.556 | 0.21 | 400 | 0.4935 | -0.1016 | -1.1994 | 0.7760 | 1.0978 | -240.4776 | -272.1671 | -3.0847 | -3.0931 |
0.5409 | 0.26 | 500 | 0.4953 | -0.4001 | -1.5875 | 0.7780 | 1.1874 | -244.3592 | -275.1525 | -3.0544 | -3.0767 |
0.5161 | 0.31 | 600 | 0.5195 | -0.3148 | -1.4151 | 0.7420 | 1.1003 | -242.6347 | -274.2988 | -3.0235 | -3.0461 |
0.4913 | 0.36 | 700 | 0.5228 | -0.5853 | -1.8669 | 0.7800 | 1.2816 | -247.1535 | -277.0044 | -2.9302 | -2.9586 |
0.4724 | 0.41 | 800 | 0.5142 | -0.6071 | -2.0565 | 0.7620 | 1.4494 | -249.0490 | -277.2221 | -2.7988 | -2.8297 |
0.5157 | 0.46 | 900 | 0.5050 | -0.5865 | -1.8166 | 0.7660 | 1.2302 | -246.6503 | -277.0157 | -2.9463 | -2.9778 |
0.4641 | 0.52 | 1000 | 0.5091 | -0.5151 | -1.9977 | 0.7580 | 1.4826 | -248.4611 | -276.3019 | -2.8916 | -2.9216 |
0.5558 | 0.57 | 1100 | 0.4971 | -0.8116 | -2.1120 | 0.7700 | 1.3004 | -249.6036 | -279.2668 | -2.8601 | -2.8914 |
0.4877 | 0.62 | 1200 | 0.5092 | -0.5596 | -1.8948 | 0.7640 | 1.3352 | -247.4319 | -276.7474 | -2.8340 | -2.8770 |
0.4922 | 0.67 | 1300 | 0.5181 | -0.9340 | -2.3745 | 0.7460 | 1.4405 | -252.2287 | -280.4910 | -2.8187 | -2.8517 |
0.5515 | 0.72 | 1400 | 0.5081 | -0.9873 | -2.2119 | 0.7440 | 1.2247 | -250.6034 | -281.0239 | -2.8488 | -2.8704 |
0.4349 | 0.77 | 1500 | 0.4996 | -0.9048 | -2.4262 | 0.7580 | 1.5214 | -252.7459 | -280.1994 | -2.8402 | -2.8601 |
0.5446 | 0.83 | 1600 | 0.4927 | -0.8717 | -2.4390 | 0.7660 | 1.5673 | -252.8737 | -279.8681 | -2.7610 | -2.7853 |
0.5242 | 0.88 | 1700 | 0.4864 | -0.6984 | -2.1381 | 0.7780 | 1.4397 | -249.8655 | -278.1355 | -2.8269 | -2.8525 |
0.5266 | 0.93 | 1800 | 0.5020 | -0.5411 | -1.9479 | 0.7760 | 1.4068 | -247.9628 | -276.5621 | -2.7381 | -2.7715 |
0.498 | 0.98 | 1900 | 0.5086 | -0.6894 | -2.0331 | 0.7640 | 1.3437 | -248.8150 | -278.0452 | -2.7298 | -2.7664 |
0.0664 | 1.03 | 2000 | 0.5137 | -1.1702 | -3.1723 | 0.7620 | 2.0021 | -260.2072 | -282.8530 | -2.6137 | -2.6605 |
0.0698 | 1.08 | 2100 | 0.5327 | -1.3645 | -3.5669 | 0.7680 | 2.2023 | -264.1527 | -284.7966 | -2.6219 | -2.6692 |
0.0715 | 1.14 | 2200 | 0.5423 | -2.0519 | -4.1983 | 0.7620 | 2.1464 | -270.4673 | -291.6701 | -2.6949 | -2.7397 |
0.0548 | 1.19 | 2300 | 0.5459 | -1.7539 | -4.0546 | 0.7700 | 2.3007 | -269.0301 | -288.6898 | -2.5996 | -2.6425 |
0.0897 | 1.24 | 2400 | 0.5317 | -1.6549 | -3.7228 | 0.7640 | 2.0679 | -265.7117 | -287.7002 | -2.6512 | -2.6870 |
0.0842 | 1.29 | 2500 | 0.5710 | -2.3000 | -4.5267 | 0.7660 | 2.2267 | -273.7511 | -294.1512 | -2.6530 | -2.6843 |
0.1321 | 1.34 | 2600 | 0.5334 | -1.8238 | -3.8561 | 0.75 | 2.0323 | -267.0450 | -289.3895 | -2.7094 | -2.7343 |
0.0862 | 1.39 | 2700 | 0.5443 | -1.8480 | -3.9514 | 0.7520 | 2.1034 | -267.9976 | -289.6307 | -2.6953 | -2.7169 |
0.0954 | 1.45 | 2800 | 0.5472 | -1.9317 | -3.9982 | 0.7620 | 2.0665 | -268.4658 | -290.4683 | -2.6900 | -2.7121 |
0.0979 | 1.5 | 2900 | 0.5471 | -2.1452 | -4.1979 | 0.7540 | 2.0526 | -270.4626 | -292.6034 | -2.6466 | -2.6788 |
0.0732 | 1.55 | 3000 | 0.5512 | -2.0252 | -4.2019 | 0.75 | 2.1767 | -270.5027 | -291.4029 | -2.6716 | -2.6981 |
0.0799 | 1.6 | 3100 | 0.5415 | -1.8888 | -3.8739 | 0.75 | 1.9851 | -267.2229 | -290.0393 | -2.6703 | -2.7143 |
0.07 | 1.65 | 3200 | 0.5399 | -1.8457 | -4.0299 | 0.7640 | 2.1843 | -268.7833 | -289.6078 | -2.6566 | -2.7002 |
0.0808 | 1.7 | 3300 | 0.5594 | -2.2307 | -4.6355 | 0.7640 | 2.4048 | -274.8385 | -293.4576 | -2.6843 | -2.7340 |
0.0501 | 1.76 | 3400 | 0.5704 | -2.5155 | -4.9551 | 0.7660 | 2.4396 | -278.0345 | -296.3059 | -2.6427 | -2.6944 |
0.061 | 1.81 | 3500 | 0.5562 | -2.2172 | -4.4937 | 0.7600 | 2.2765 | -273.4208 | -293.3234 | -2.7086 | -2.7404 |
0.0979 | 1.86 | 3600 | 0.5656 | -2.6495 | -5.0323 | 0.7520 | 2.3828 | -278.8068 | -297.6461 | -2.6381 | -2.6765 |
0.0631 | 1.91 | 3700 | 0.5668 | -2.5055 | -4.7949 | 0.7560 | 2.2895 | -276.4331 | -296.2057 | -2.6407 | -2.6818 |
0.1202 | 1.96 | 3800 | 0.5678 | -2.6581 | -4.7249 | 0.7580 | 2.0668 | -275.7330 | -297.7322 | -2.6716 | -2.7125 |
0.022 | 2.01 | 3900 | 0.5657 | -2.6893 | -5.1672 | 0.7720 | 2.4778 | -280.1555 | -298.0444 | -2.6680 | -2.7125 |
0.0177 | 2.07 | 4000 | 0.6171 | -3.3461 | -6.2908 | 0.7680 | 2.9447 | -291.3919 | -304.6117 | -2.6431 | -2.6916 |
0.0108 | 2.12 | 4100 | 0.6389 | -3.3448 | -6.3803 | 0.7660 | 3.0355 | -292.2874 | -304.5994 | -2.6225 | -2.6701 |
0.0108 | 2.17 | 4200 | 0.6562 | -3.5386 | -6.6028 | 0.7620 | 3.0642 | -294.5121 | -306.5373 | -2.6323 | -2.6797 |
0.0105 | 2.22 | 4300 | 0.6742 | -3.7048 | -6.8992 | 0.7560 | 3.1944 | -297.4764 | -308.1995 | -2.6192 | -2.6678 |
0.018 | 2.27 | 4400 | 0.6982 | -4.1642 | -7.4837 | 0.7680 | 3.3195 | -303.3213 | -312.7930 | -2.5975 | -2.6454 |
0.0173 | 2.32 | 4500 | 0.6661 | -3.9139 | -6.9481 | 0.7660 | 3.0342 | -297.9650 | -310.2904 | -2.5967 | -2.6394 |
0.011 | 2.37 | 4600 | 0.6606 | -3.7121 | -6.8279 | 0.7640 | 3.1158 | -296.7630 | -308.2721 | -2.5628 | -2.6068 |
0.0096 | 2.43 | 4700 | 0.6705 | -3.9088 | -7.1613 | 0.7680 | 3.2524 | -300.0965 | -310.2393 | -2.5127 | -2.5613 |
0.0099 | 2.48 | 4800 | 0.6825 | -3.9836 | -7.2552 | 0.7720 | 3.2716 | -301.0364 | -310.9875 | -2.5169 | -2.5658 |
0.0106 | 2.53 | 4900 | 0.6938 | -4.2534 | -7.7587 | 0.7660 | 3.5053 | -306.0710 | -313.6849 | -2.5330 | -2.5844 |
0.0106 | 2.58 | 5000 | 0.6949 | -4.2978 | -7.7919 | 0.7660 | 3.4942 | -306.4034 | -314.1288 | -2.5330 | -2.5826 |
0.0099 | 2.63 | 5100 | 0.7239 | -4.3508 | -8.0105 | 0.7640 | 3.6598 | -308.5892 | -314.6587 | -2.5095 | -2.5620 |
0.0074 | 2.68 | 5200 | 0.7394 | -4.7364 | -8.4819 | 0.7660 | 3.7456 | -313.3035 | -318.5147 | -2.5378 | -2.5891 |
0.0043 | 2.74 | 5300 | 0.7335 | -4.6351 | -8.3990 | 0.7720 | 3.7639 | -312.4740 | -317.5019 | -2.5539 | -2.6052 |
0.0163 | 2.79 | 5400 | 0.7317 | -4.6741 | -8.3958 | 0.7700 | 3.7217 | -312.4420 | -317.8924 | -2.5490 | -2.5993 |
0.0081 | 2.84 | 5500 | 0.7420 | -4.9166 | -8.6945 | 0.7740 | 3.7779 | -315.4291 | -320.3167 | -2.5307 | -2.5816 |
0.0067 | 2.89 | 5600 | 0.7369 | -4.9581 | -8.7224 | 0.7680 | 3.7643 | -315.7077 | -320.7321 | -2.5437 | -2.5941 |
0.0081 | 2.94 | 5700 | 0.7345 | -4.9719 | -8.7499 | 0.7720 | 3.7780 | -315.9826 | -320.8700 | -2.5442 | -2.5946 |
0.0043 | 2.99 | 5800 | 0.7338 | -4.9141 | -8.6850 | 0.7700 | 3.7709 | -315.3341 | -320.2925 | -2.5452 | -2.5956 |
Framework versions
- Transformers 4.35.0
- Pytorch 2.1.0
- Datasets 2.14.6
- Tokenizers 0.14.1
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for timlim123/zephyr-7b-dpo-full
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full