---
base_model: ./data/zephyr-7b-sft-full
tags:
- alignment-handbook
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: zephyr-7b-dpo-full
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full

This model is a fine-tuned version of [./data/zephyr-7b-sft-full](https://huggingface.co/./data/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5105
- Rewards/chosen: -1.7322
- Rewards/rejected: -3.3299
- Rewards/accuracies: 0.7619
- Rewards/margins: 1.5977
- Logps/rejected: -315.2173
- Logps/chosen: -359.3560
- Logits/rejected: -0.7333
- Logits/chosen: -0.7199

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6517        | 0.1   | 100  | 0.6389          | -0.0070        | -0.1621          | 0.6905             | 0.1551          | -283.5396      | -342.1045    | -0.5321         | -0.4793       |
| 0.5605        | 0.21  | 200  | 0.5619          | -0.0146        | -0.6024          | 0.7381             | 0.5879          | -287.9430      | -342.1800    | -0.5264         | -0.4852       |
| 0.5581        | 0.31  | 300  | 0.5333          | -0.0290        | -0.8509          | 0.7540             | 0.8219          | -290.4272      | -342.3241    | -0.5108         | -0.4742       |
| 0.5467        | 0.41  | 400  | 0.5165          | -0.1986        | -1.1136          | 0.7698             | 0.9150          | -293.0540      | -344.0201    | -0.5404         | -0.5044       |
| 0.5223        | 0.52  | 500  | 0.5120          | -0.1374        | -1.1105          | 0.7659             | 0.9730          | -293.0233      | -343.4084    | -0.5315         | -0.4944       |
| 0.5265        | 0.62  | 600  | 0.5085          | -0.2099        | -1.2965          | 0.7698             | 1.0866          | -294.8834      | -344.1335    | -0.5350         | -0.4980       |
| 0.5342        | 0.72  | 700  | 0.4961          | -0.1152        | -1.1322          | 0.7738             | 1.0170          | -293.2408      | -343.1862    | -0.5509         | -0.5124       |
| 0.48          | 0.83  | 800  | 0.4913          | -0.1837        | -1.1984          | 0.7619             | 1.0148          | -293.9029      | -343.8708    | -0.5183         | -0.4760       |
| 0.517         | 0.93  | 900  | 0.4865          | -0.1696        | -1.2078          | 0.7659             | 1.0382          | -293.9965      | -343.7298    | -0.5289         | -0.4854       |
| 0.477         | 1.03  | 1000 | 0.4905          | -0.1084        | -1.2175          | 0.7619             | 1.1090          | -294.0931      | -343.1185    | -0.5469         | -0.5062       |
| 0.4033        | 1.14  | 1100 | 0.4870          | -0.1598        | -1.2266          | 0.7540             | 1.0668          | -294.1847      | -343.6326    | -0.5547         | -0.5138       |
| 0.3284        | 1.24  | 1200 | 0.4836          | -0.3432        | -1.5002          | 0.7817             | 1.1570          | -296.9207      | -345.4664    | -0.5812         | -0.5440       |
| 0.2574        | 1.34  | 1300 | 0.4861          | -0.5667        | -1.8467          | 0.7738             | 1.2801          | -300.3859      | -347.7009    | -0.5840         | -0.5523       |
| 0.2641        | 1.44  | 1400 | 0.4897          | -0.6824        | -1.9954          | 0.7698             | 1.3129          | -301.8724      | -348.8586    | -0.6308         | -0.6034       |
| 0.2424        | 1.55  | 1500 | 0.5010          | -0.8646        | -2.2932          | 0.7540             | 1.4286          | -304.8503      | -350.6802    | -0.6025         | -0.5800       |
| 0.2944        | 1.65  | 1600 | 0.4927          | -0.7608        | -2.1089          | 0.7659             | 1.3480          | -303.0073      | -349.6426    | -0.6171         | -0.5909       |
| 0.2958        | 1.75  | 1700 | 0.4913          | -0.8080        | -2.1126          | 0.7698             | 1.3046          | -303.0449      | -350.1146    | -0.6429         | -0.6156       |
| 0.2667        | 1.86  | 1800 | 0.4877          | -0.9185        | -2.2364          | 0.7619             | 1.3178          | -304.2823      | -351.2196    | -0.6212         | -0.5936       |
| 0.2494        | 1.96  | 1900 | 0.4853          | -0.8965        | -2.2705          | 0.75               | 1.3740          | -304.6238      | -350.9996    | -0.6262         | -0.6005       |
| 0.2631        | 2.06  | 2000 | 0.4869          | -0.7974        | -2.1804          | 0.7698             | 1.3830          | -303.7225      | -350.0081    | -0.6231         | -0.5974       |
| 0.1965        | 2.17  | 2100 | 0.4886          | -1.0005        | -2.3981          | 0.7540             | 1.3977          | -305.8999      | -352.0387    | -0.6557         | -0.6330       |
| 0.1711        | 2.27  | 2200 | 0.4910          | -1.1688        | -2.6422          | 0.7778             | 1.4734          | -308.3400      | -353.7221    | -0.6689         | -0.6486       |
| 0.1492        | 2.37  | 2300 | 0.5077          | -1.4306        | -3.0185          | 0.7778             | 1.5878          | -312.1035      | -356.3406    | -0.7016         | -0.6846       |
| 0.1448        | 2.48  | 2400 | 0.5113          | -1.6343        | -3.3087          | 0.7659             | 1.6744          | -315.0052      | -358.3771    | -0.7281         | -0.7164       |
| 0.1425        | 2.58  | 2500 | 0.5185          | -1.6767        | -3.4070          | 0.7698             | 1.7304          | -315.9888      | -358.8008    | -0.7207         | -0.7101       |
| 0.1661        | 2.68  | 2600 | 0.5144          | -1.6680        | -3.3881          | 0.7659             | 1.7201          | -315.7997      | -358.7144    | -0.7288         | -0.7184       |
| 0.1755        | 2.79  | 2700 | 0.5153          | -1.7546        | -3.3676          | 0.7619             | 1.6130          | -315.5941      | -359.5799    | -0.7388         | -0.7261       |
| 0.1677        | 2.89  | 2800 | 0.5120          | -1.7415        | -3.3279          | 0.7540             | 1.5863          | -315.1972      | -359.4494    | -0.7350         | -0.7219       |
| 0.1711        | 2.99  | 2900 | 0.5120          | -1.7362        | -3.3282          | 0.7619             | 1.5920          | -315.2005      | -359.3962    | -0.7329         | -0.7195       |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.1
- Datasets 2.14.7
- Tokenizers 0.14.1