Edit model card

zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4888
  • Rewards/chosen: -3.3026
  • Rewards/rejected: -4.6171
  • Rewards/accuracies: 0.7510
  • Rewards/margins: 1.3145
  • Logps/rejected: -706.2916
  • Logps/chosen: -594.8843
  • Logits/rejected: 1.7556
  • Logits/chosen: 1.0124

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6885 0.01 100 0.6887 0.0401 0.0310 0.6155 0.0091 -241.4763 -260.6096 -2.3013 -2.3864
0.6826 0.03 200 0.6777 0.0538 0.0208 0.6555 0.0329 -242.4942 -259.2415 -2.2939 -2.3792
0.6623 0.04 300 0.6578 -0.0931 -0.1758 0.6735 0.0827 -262.1588 -273.9337 -2.2310 -2.3202
0.6619 0.05 400 0.6455 -0.2994 -0.4240 0.6610 0.1245 -286.9754 -294.5644 -2.0309 -2.1441
0.6257 0.07 500 0.6194 -0.3522 -0.5612 0.6850 0.2089 -300.6967 -299.8442 -2.0400 -2.1485
0.6114 0.08 600 0.6004 -0.6308 -0.9602 0.6755 0.3295 -340.6012 -327.6964 -1.5503 -1.7200
0.5394 0.09 700 0.6103 -1.5690 -1.9843 0.6635 0.4153 -443.0096 -421.5208 -0.6532 -0.9309
0.6171 0.1 800 0.6372 -1.7546 -2.0641 0.6405 0.3095 -450.9858 -440.0762 0.0235 -0.3349
0.5553 0.12 900 0.5687 -1.3500 -1.8540 0.6930 0.5041 -429.9809 -399.6168 2.6187 1.9978
0.6299 0.13 1000 0.5620 -1.1629 -1.7464 0.6975 0.5835 -419.2182 -380.9113 3.4192 2.7155
0.5898 0.14 1100 0.5619 -2.4368 -3.0963 0.7090 0.6594 -554.2042 -508.3033 5.3078 4.4134
0.4782 0.16 1200 0.5594 -1.5060 -2.2383 0.7090 0.7323 -468.4132 -415.2229 4.0187 3.1485
0.5709 0.17 1300 0.5481 -1.7316 -2.3668 0.7245 0.6352 -481.2582 -437.7783 4.1315 3.2570
0.5181 0.18 1400 0.5454 -2.4857 -3.3898 0.7140 0.9042 -583.5640 -513.1900 4.6977 3.6944
0.5495 0.2 1500 0.5428 -2.5602 -3.3574 0.7205 0.7972 -580.3215 -520.6432 4.1847 3.2888
0.574 0.21 1600 0.5638 -2.7101 -3.5446 0.7190 0.8346 -599.0428 -535.6277 4.9219 3.9304
0.4901 0.22 1700 0.5284 -2.4900 -3.3577 0.7335 0.8677 -580.3493 -513.6201 3.8220 2.9305
0.5149 0.24 1800 0.5408 -1.7507 -2.4663 0.7215 0.7156 -491.2047 -439.6899 2.0262 1.2751
0.6382 0.25 1900 0.5325 -2.1268 -2.9548 0.7255 0.8279 -540.0542 -477.3052 2.4039 1.4990
0.5178 0.26 2000 0.5276 -1.4221 -2.1526 0.7305 0.7305 -459.8390 -406.8324 1.5288 0.8157
0.524 0.27 2100 0.5663 -2.7101 -3.7077 0.7110 0.9976 -615.3445 -535.6266 2.5955 1.6625
0.523 0.29 2200 0.5422 -2.2871 -3.3438 0.7230 1.0567 -578.9616 -493.3343 3.5955 2.5436
0.5431 0.3 2300 0.5253 -2.1932 -3.2183 0.7340 1.0252 -566.4124 -483.9387 4.2433 3.2004
0.5147 0.31 2400 0.5132 -2.8441 -3.8795 0.7315 1.0354 -632.5286 -549.0342 4.6772 3.6861
0.4198 0.33 2500 0.5214 -2.1756 -3.1443 0.7290 0.9687 -559.0054 -482.1783 2.7950 1.8511
0.5994 0.34 2600 0.5188 -3.1314 -4.1849 0.7290 1.0535 -663.0683 -577.7604 3.4511 2.4450
0.4812 0.35 2700 0.5139 -3.0136 -4.1060 0.7455 1.0924 -655.1821 -565.9851 3.7760 2.7916
0.4696 0.37 2800 0.5137 -2.2305 -3.2368 0.7355 1.0063 -568.2574 -487.6709 2.6757 1.8289
0.5418 0.38 2900 0.5177 -2.0641 -3.1462 0.7345 1.0822 -559.2020 -471.0270 2.0189 1.1899
0.5068 0.39 3000 0.5096 -2.4564 -3.5648 0.7400 1.1084 -601.0543 -510.2569 2.8679 2.0023
0.4429 0.41 3100 0.5324 -2.7544 -3.8869 0.7180 1.1325 -633.2682 -540.0566 1.3309 0.6491
0.5977 0.42 3200 0.4963 -2.8842 -3.9825 0.7425 1.0983 -642.8285 -553.0416 2.0170 1.2328
0.5281 0.43 3300 0.5074 -2.4254 -3.5511 0.7325 1.1257 -599.6907 -507.1647 1.1826 0.4294
0.5114 0.44 3400 0.5197 -2.8424 -4.0833 0.7255 1.2409 -652.9095 -548.8630 2.1493 1.2128
0.4984 0.46 3500 0.5002 -3.1997 -4.4222 0.7450 1.2225 -686.7951 -584.5864 3.3502 2.4203
0.5723 0.47 3600 0.5010 -3.0065 -4.2439 0.7410 1.2374 -668.9721 -565.2749 3.1534 2.2598
0.5496 0.48 3700 0.5015 -3.0581 -4.3336 0.7395 1.2755 -677.9391 -570.4304 3.3120 2.4472
0.5106 0.5 3800 0.5013 -3.5077 -4.8209 0.7395 1.3132 -726.6729 -615.3915 2.7134 1.8547
0.376 0.51 3900 0.4995 -3.2636 -4.5260 0.7415 1.2624 -697.1753 -590.9803 2.7739 1.9628
0.4935 0.52 4000 0.4916 -2.8251 -3.9628 0.7465 1.1377 -640.8605 -547.1311 2.2899 1.5516
0.445 0.54 4100 0.4959 -3.1300 -4.4063 0.7480 1.2763 -685.2046 -577.6177 2.5949 1.8263
0.443 0.55 4200 0.5039 -2.6104 -3.9167 0.7345 1.3063 -636.2510 -525.6652 2.5643 1.7637
0.517 0.56 4300 0.5042 -3.0608 -4.4485 0.7375 1.3877 -689.4330 -570.7054 2.6212 1.8545
0.3693 0.58 4400 0.4969 -3.2698 -4.5598 0.7470 1.2900 -700.5564 -591.6002 2.5178 1.8051
0.481 0.59 4500 0.4893 -2.8076 -3.9614 0.7445 1.1537 -640.7148 -545.3853 2.0329 1.3648
0.4696 0.6 4600 0.4945 -3.3369 -4.5983 0.7465 1.2614 -704.4065 -598.3125 2.6733 1.9401
0.4437 0.62 4700 0.4940 -2.8130 -4.0860 0.7445 1.2730 -653.1788 -545.9229 2.0547 1.2696
0.4492 0.63 4800 0.4963 -2.7727 -4.0657 0.7465 1.2930 -651.1524 -541.8960 2.3393 1.5355
0.5163 0.64 4900 0.5017 -3.3498 -4.7649 0.7465 1.4150 -721.0643 -599.6019 2.0201 1.2216
0.488 0.65 5000 0.4917 -3.2508 -4.5623 0.7480 1.3115 -700.8107 -589.7007 1.9166 1.1418
0.3606 0.67 5100 0.4905 -2.9757 -4.2308 0.7460 1.2551 -667.6595 -562.1877 1.5031 0.7813
0.58 0.68 5200 0.4897 -2.8783 -4.1021 0.75 1.2239 -654.7924 -552.4492 1.2839 0.5850
0.5788 0.69 5300 0.4900 -3.0607 -4.2816 0.7490 1.2209 -672.7391 -570.6943 1.4059 0.7114
0.4138 0.71 5400 0.4910 -3.3493 -4.6193 0.7515 1.2701 -706.5120 -599.5464 1.6121 0.8970
0.5737 0.72 5500 0.4898 -3.1843 -4.4515 0.7480 1.2672 -689.7249 -583.0511 1.4061 0.6955
0.4249 0.73 5600 0.4918 -3.3448 -4.6778 0.7490 1.3330 -712.3564 -599.0980 1.7110 0.9558
0.5457 0.75 5700 0.4897 -3.2784 -4.5741 0.75 1.2957 -701.9877 -592.4562 1.7372 0.9922
0.5287 0.76 5800 0.4920 -3.3167 -4.6600 0.7495 1.3433 -710.5778 -596.2890 1.9802 1.2037
0.5286 0.77 5900 0.4919 -3.2305 -4.5655 0.7465 1.3350 -701.1276 -587.6722 1.9038 1.1361
0.5147 0.79 6000 0.4910 -3.3145 -4.6435 0.7505 1.3290 -708.9319 -596.0760 1.9303 1.1726
0.4478 0.8 6100 0.4886 -3.2069 -4.5013 0.7480 1.2944 -694.7131 -585.3105 1.7621 1.0186
0.5236 0.81 6200 0.4901 -3.3207 -4.6497 0.7495 1.3290 -709.5499 -596.6957 1.8309 1.0794
0.5079 0.82 6300 0.4890 -3.3084 -4.6220 0.7495 1.3137 -706.7820 -595.4583 1.7747 1.0322
0.4942 0.84 6400 0.4891 -3.2621 -4.5672 0.7495 1.3051 -701.3010 -590.8314 1.7716 1.0268
0.4688 0.85 6500 0.4891 -3.2863 -4.5956 0.7505 1.3093 -704.1410 -593.2547 1.7863 1.0402
0.5062 0.86 6600 0.4889 -3.2923 -4.6029 0.7485 1.3106 -704.8691 -593.8478 1.7695 1.0261
0.574 0.88 6700 0.4887 -3.2779 -4.5886 0.7495 1.3108 -703.4429 -592.4089 1.7573 1.0140
0.5737 0.89 6800 0.4887 -3.2917 -4.6042 0.7510 1.3124 -704.9940 -593.7938 1.7560 1.0126
0.4298 0.9 6900 0.4889 -3.2985 -4.6115 0.7505 1.3131 -705.7332 -594.4664 1.7563 1.0130
0.55 0.92 7000 0.4889 -3.2997 -4.6137 0.7505 1.3140 -705.9527 -594.5901 1.7567 1.0132
0.4123 0.93 7100 0.4889 -3.3026 -4.6168 0.7515 1.3142 -706.2578 -594.8819 1.7586 1.0151
0.5207 0.94 7200 0.4887 -3.3049 -4.6192 0.75 1.3143 -706.5007 -595.1128 1.7557 1.0126
0.4618 0.96 7300 0.4888 -3.3019 -4.6165 0.7515 1.3145 -706.2247 -594.8143 1.7552 1.0116
0.4826 0.97 7400 0.4889 -3.3035 -4.6177 0.7510 1.3142 -706.3512 -594.9731 1.7538 1.0108
0.3856 0.98 7500 0.4887 -3.3043 -4.6187 0.7515 1.3144 -706.4486 -595.0473 1.7544 1.0114
0.5369 0.99 7600 0.4886 -3.3028 -4.6175 0.7520 1.3147 -706.3290 -594.9012 1.7559 1.0126

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for WL2928/zephyr-7b-dpo-qlora

Adapter
(1170)
this model

Dataset used to train WL2928/zephyr-7b-dpo-qlora