File size: 9,352 Bytes
68e0893 58b624e 68e0893 58b624e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
---
license: apache-2.0
base_model: amazingvince/zephyr-220m-sft-full
tags:
- generated_from_trainer
model-index:
- name: zephyr-220m-dpo-full
results: []
datasets:
- HuggingFaceH4/ultrafeedback_binarized
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# zephyr-220m-dpo-full
This model is a fine-tuned version of [amazingvince/zephyr-220m-sft-full](https://huggingface.co/amazingvince/zephyr-220m-sft-full) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5608
- Rewards/chosen: 0.4691
- Rewards/rejected: -0.0455
- Rewards/accuracies: 0.6930
- Rewards/margins: 0.5145
- Logps/rejected: -438.4595
- Logps/chosen: -544.6858
- Logits/rejected: -4.0092
- Logits/chosen: -3.9839
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 16
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6906 | 0.03 | 100 | 0.6932 | 0.0008 | 0.0007 | 0.4860 | 0.0002 | -437.9984 | -549.3683 | -4.0893 | -4.0515 |
| 0.6844 | 0.05 | 200 | 0.6855 | 0.0323 | 0.0173 | 0.5640 | 0.0150 | -437.8319 | -549.0540 | -4.0871 | -4.0501 |
| 0.6685 | 0.08 | 300 | 0.6675 | 0.1075 | 0.0537 | 0.6160 | 0.0538 | -437.4682 | -548.3016 | -4.0788 | -4.0432 |
| 0.6579 | 0.1 | 400 | 0.6426 | 0.2153 | 0.0941 | 0.6430 | 0.1212 | -437.0637 | -547.2234 | -4.0645 | -4.0309 |
| 0.6331 | 0.13 | 500 | 0.6241 | 0.2980 | 0.1106 | 0.6430 | 0.1874 | -436.8989 | -546.3970 | -4.0525 | -4.0221 |
| 0.6229 | 0.15 | 600 | 0.6138 | 0.3428 | 0.1103 | 0.6580 | 0.2325 | -436.9023 | -545.9487 | -4.0402 | -4.0116 |
| 0.6008 | 0.18 | 700 | 0.6053 | 0.3822 | 0.0970 | 0.6560 | 0.2852 | -437.0354 | -545.5550 | -4.0301 | -4.0042 |
| 0.5751 | 0.21 | 800 | 0.5998 | 0.4077 | 0.0879 | 0.6540 | 0.3198 | -437.1260 | -545.2994 | -4.0359 | -4.0099 |
| 0.6485 | 0.23 | 900 | 0.5922 | 0.4208 | 0.0655 | 0.6600 | 0.3553 | -437.3501 | -545.1683 | -4.0167 | -3.9936 |
| 0.6164 | 0.26 | 1000 | 0.5880 | 0.4046 | 0.0287 | 0.6620 | 0.3759 | -437.7182 | -545.3309 | -4.0092 | -3.9869 |
| 0.6225 | 0.28 | 1100 | 0.5852 | 0.4058 | 0.0110 | 0.6680 | 0.3948 | -437.8951 | -545.3189 | -4.0240 | -3.9984 |
| 0.6289 | 0.31 | 1200 | 0.5824 | 0.4127 | 0.0078 | 0.6670 | 0.4048 | -437.9265 | -545.2498 | -4.0253 | -3.9994 |
| 0.5818 | 0.34 | 1300 | 0.5818 | 0.4222 | 0.0097 | 0.6680 | 0.4125 | -437.9080 | -545.1544 | -4.0212 | -3.9953 |
| 0.567 | 0.36 | 1400 | 0.5797 | 0.4098 | -0.0141 | 0.6730 | 0.4238 | -438.1456 | -545.2791 | -4.0333 | -4.0062 |
| 0.5659 | 0.39 | 1500 | 0.5790 | 0.4204 | -0.0154 | 0.6780 | 0.4358 | -438.1591 | -545.1725 | -4.0245 | -3.9963 |
| 0.5993 | 0.41 | 1600 | 0.5783 | 0.4161 | -0.0285 | 0.6720 | 0.4446 | -438.2904 | -545.2161 | -4.0185 | -3.9907 |
| 0.5999 | 0.44 | 1700 | 0.5767 | 0.4067 | -0.0468 | 0.6840 | 0.4535 | -438.4729 | -545.3095 | -4.0207 | -3.9935 |
| 0.6004 | 0.46 | 1800 | 0.5731 | 0.4233 | -0.0394 | 0.6830 | 0.4627 | -438.3991 | -545.1437 | -4.0219 | -3.9944 |
| 0.5349 | 0.49 | 1900 | 0.5720 | 0.4285 | -0.0429 | 0.6830 | 0.4714 | -438.4335 | -545.0914 | -4.0295 | -4.0012 |
| 0.5377 | 0.52 | 2000 | 0.5702 | 0.4255 | -0.0540 | 0.6850 | 0.4795 | -438.5449 | -545.1220 | -4.0290 | -4.0009 |
| 0.4988 | 0.54 | 2100 | 0.5713 | 0.4347 | -0.0548 | 0.6840 | 0.4895 | -438.5533 | -545.0299 | -4.0317 | -4.0039 |
| 0.6093 | 0.57 | 2200 | 0.5706 | 0.4464 | -0.0456 | 0.6810 | 0.4920 | -438.4607 | -544.9128 | -4.0288 | -4.0014 |
| 0.5356 | 0.59 | 2300 | 0.5689 | 0.4484 | -0.0486 | 0.6880 | 0.4971 | -438.4912 | -544.8922 | -4.0257 | -3.9986 |
| 0.5753 | 0.62 | 2400 | 0.5681 | 0.4596 | -0.0441 | 0.6850 | 0.5037 | -438.4457 | -544.7802 | -4.0100 | -3.9846 |
| 0.5709 | 0.65 | 2500 | 0.5673 | 0.4693 | -0.0387 | 0.6910 | 0.5081 | -438.3924 | -544.6835 | -4.0100 | -3.9849 |
| 0.5565 | 0.67 | 2600 | 0.5665 | 0.4692 | -0.0401 | 0.6820 | 0.5092 | -438.4054 | -544.6850 | -4.0096 | -3.9843 |
| 0.585 | 0.7 | 2700 | 0.5650 | 0.4780 | -0.0351 | 0.6940 | 0.5131 | -438.3558 | -544.5962 | -4.0074 | -3.9820 |
| 0.5883 | 0.72 | 2800 | 0.5670 | 0.4914 | -0.0151 | 0.6880 | 0.5066 | -438.1562 | -544.4624 | -3.9894 | -3.9669 |
| 0.624 | 0.75 | 2900 | 0.5663 | 0.4877 | -0.0191 | 0.6840 | 0.5068 | -438.1958 | -544.4997 | -3.9935 | -3.9705 |
| 0.5347 | 0.77 | 3000 | 0.5644 | 0.4757 | -0.0335 | 0.6850 | 0.5092 | -438.3401 | -544.6199 | -4.0019 | -3.9777 |
| 0.5837 | 0.8 | 3100 | 0.5637 | 0.4783 | -0.0302 | 0.6830 | 0.5085 | -438.3073 | -544.5936 | -3.9976 | -3.9742 |
| 0.5293 | 0.83 | 3200 | 0.5634 | 0.4715 | -0.0363 | 0.6890 | 0.5078 | -438.3679 | -544.6616 | -4.0023 | -3.9778 |
| 0.5128 | 0.85 | 3300 | 0.5620 | 0.4745 | -0.0387 | 0.6880 | 0.5131 | -438.3917 | -544.6319 | -4.0053 | -3.9804 |
| 0.6204 | 0.88 | 3400 | 0.5625 | 0.4679 | -0.0442 | 0.6860 | 0.5121 | -438.4469 | -544.6978 | -4.0067 | -3.9815 |
| 0.5469 | 0.9 | 3500 | 0.5618 | 0.4612 | -0.0491 | 0.6860 | 0.5102 | -438.4956 | -544.7651 | -4.0098 | -3.9843 |
| 0.5807 | 0.93 | 3600 | 0.5615 | 0.4675 | -0.0454 | 0.6890 | 0.5129 | -438.4584 | -544.7015 | -4.0068 | -3.9818 |
| 0.5265 | 0.96 | 3700 | 0.5620 | 0.4675 | -0.0435 | 0.6880 | 0.5110 | -438.4403 | -544.7019 | -4.0082 | -3.9833 |
| 0.5484 | 0.98 | 3800 | 0.5615 | 0.4685 | -0.0449 | 0.6930 | 0.5133 | -438.4536 | -544.6919 | -4.0103 | -3.9851 |
### Framework versions
- Transformers 4.37.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.15.0
- Tokenizers 0.15.0
https://wandb.ai/amazingvince/huggingface/runs/z71h0hc3?workspace=user-amazingvince |