metadata

base_model: ./data/zephyr-7b-sft-full
tags:
  - alignment-handbook
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of ./data/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.5105
Rewards/chosen: -1.7322
Rewards/rejected: -3.3299
Rewards/accuracies: 0.7619
Rewards/margins: 1.5977
Logps/rejected: -315.2173
Logps/chosen: -359.3560
Logits/rejected: -0.7333
Logits/chosen: -0.7199

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6517	0.1	100	0.6389	-0.0070	-0.1621	0.6905	0.1551	-283.5396	-342.1045	-0.5321	-0.4793
0.5605	0.21	200	0.5619	-0.0146	-0.6024	0.7381	0.5879	-287.9430	-342.1800	-0.5264	-0.4852
0.5581	0.31	300	0.5333	-0.0290	-0.8509	0.7540	0.8219	-290.4272	-342.3241	-0.5108	-0.4742
0.5467	0.41	400	0.5165	-0.1986	-1.1136	0.7698	0.9150	-293.0540	-344.0201	-0.5404	-0.5044
0.5223	0.52	500	0.5120	-0.1374	-1.1105	0.7659	0.9730	-293.0233	-343.4084	-0.5315	-0.4944
0.5265	0.62	600	0.5085	-0.2099	-1.2965	0.7698	1.0866	-294.8834	-344.1335	-0.5350	-0.4980
0.5342	0.72	700	0.4961	-0.1152	-1.1322	0.7738	1.0170	-293.2408	-343.1862	-0.5509	-0.5124
0.48	0.83	800	0.4913	-0.1837	-1.1984	0.7619	1.0148	-293.9029	-343.8708	-0.5183	-0.4760
0.517	0.93	900	0.4865	-0.1696	-1.2078	0.7659	1.0382	-293.9965	-343.7298	-0.5289	-0.4854
0.477	1.03	1000	0.4905	-0.1084	-1.2175	0.7619	1.1090	-294.0931	-343.1185	-0.5469	-0.5062
0.4033	1.14	1100	0.4870	-0.1598	-1.2266	0.7540	1.0668	-294.1847	-343.6326	-0.5547	-0.5138
0.3284	1.24	1200	0.4836	-0.3432	-1.5002	0.7817	1.1570	-296.9207	-345.4664	-0.5812	-0.5440
0.2574	1.34	1300	0.4861	-0.5667	-1.8467	0.7738	1.2801	-300.3859	-347.7009	-0.5840	-0.5523
0.2641	1.44	1400	0.4897	-0.6824	-1.9954	0.7698	1.3129	-301.8724	-348.8586	-0.6308	-0.6034
0.2424	1.55	1500	0.5010	-0.8646	-2.2932	0.7540	1.4286	-304.8503	-350.6802	-0.6025	-0.5800
0.2944	1.65	1600	0.4927	-0.7608	-2.1089	0.7659	1.3480	-303.0073	-349.6426	-0.6171	-0.5909
0.2958	1.75	1700	0.4913	-0.8080	-2.1126	0.7698	1.3046	-303.0449	-350.1146	-0.6429	-0.6156
0.2667	1.86	1800	0.4877	-0.9185	-2.2364	0.7619	1.3178	-304.2823	-351.2196	-0.6212	-0.5936
0.2494	1.96	1900	0.4853	-0.8965	-2.2705	0.75	1.3740	-304.6238	-350.9996	-0.6262	-0.6005
0.2631	2.06	2000	0.4869	-0.7974	-2.1804	0.7698	1.3830	-303.7225	-350.0081	-0.6231	-0.5974
0.1965	2.17	2100	0.4886	-1.0005	-2.3981	0.7540	1.3977	-305.8999	-352.0387	-0.6557	-0.6330
0.1711	2.27	2200	0.4910	-1.1688	-2.6422	0.7778	1.4734	-308.3400	-353.7221	-0.6689	-0.6486
0.1492	2.37	2300	0.5077	-1.4306	-3.0185	0.7778	1.5878	-312.1035	-356.3406	-0.7016	-0.6846
0.1448	2.48	2400	0.5113	-1.6343	-3.3087	0.7659	1.6744	-315.0052	-358.3771	-0.7281	-0.7164
0.1425	2.58	2500	0.5185	-1.6767	-3.4070	0.7698	1.7304	-315.9888	-358.8008	-0.7207	-0.7101
0.1661	2.68	2600	0.5144	-1.6680	-3.3881	0.7659	1.7201	-315.7997	-358.7144	-0.7288	-0.7184
0.1755	2.79	2700	0.5153	-1.7546	-3.3676	0.7619	1.6130	-315.5941	-359.5799	-0.7388	-0.7261
0.1677	2.89	2800	0.5120	-1.7415	-3.3279	0.7540	1.5863	-315.1972	-359.4494	-0.7350	-0.7219
0.1711	2.99	2900	0.5120	-1.7362	-3.3282	0.7619	1.5920	-315.2005	-359.3962	-0.7329	-0.7195

Framework versions

Transformers 4.35.2
Pytorch 2.1.1
Datasets 2.14.7
Tokenizers 0.14.1