zephyr-7b-dpo-lora-pubmedqa-mix2

This model is a fine-tuned version of EllieS/zephyr-7b-sft-qlora on the EllieS/pubmedqa_dpo_mix_data dataset. It achieves the following results on the evaluation set:

Loss: 0.0013
Rewards/chosen: -1.8126
Rewards/rejected: -10.9731
Rewards/accuracies: 1.0
Rewards/margins: 9.1605
Logps/rejected: -1144.0397
Logps/chosen: -242.4412
Logits/rejected: -1.7638
Logits/chosen: -2.8841

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 2
total_train_batch_size: 4
total_eval_batch_size: 2
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.2697	0.04	3000	0.3396	0.2213	-0.6386	1.0	0.8599	-110.5876	-39.0518	-3.0278	-3.0862
0.1599	0.07	6000	0.0750	-0.5884	-3.6673	1.0	3.0789	-413.4546	-120.0204	-2.9055	-3.0346
0.0563	0.11	9000	0.0204	-0.6260	-5.6712	1.0	5.0452	-613.8441	-123.7819	-3.0269	-3.1136
0.0463	0.14	12000	0.0287	-0.7209	-7.9224	1.0	7.2014	-838.9609	-133.2740	-3.0642	-3.1628
0.1206	0.18	15000	0.0030	-0.9209	-8.8089	1.0	7.8880	-927.6118	-153.2670	-3.0802	-3.1766
0.0508	0.22	18000	0.4964	-0.4026	-8.0330	1.0	7.6304	-850.0245	-101.4397	-3.1314	-3.2075
0.0323	0.25	21000	0.0872	-1.4713	-10.3437	1.0	8.8723	-1081.0913	-208.3129	-2.6496	-3.1189
0.4534	0.29	24000	0.0077	-2.3507	-12.1827	1.0	9.8320	-1264.9957	-296.2491	-1.6282	-2.8665
0.0013	0.32	27000	0.0019	-2.1480	-10.6645	1.0	8.5166	-1113.1797	-275.9768	-1.7614	-2.8604
0.1404	0.36	30000	0.0002	-2.4964	-12.4101	1.0	9.9138	-1287.7384	-310.8155	-1.5907	-2.8352
0.0198	0.4	33000	0.0009	-3.0802	-13.3347	1.0	10.2545	-1380.1964	-369.1991	-1.6628	-2.8372
0.0041	0.43	36000	0.0004	-2.7800	-12.5815	1.0	9.8014	-1304.8732	-339.1852	-1.6282	-2.8242
0.0007	0.47	39000	0.0007	-2.9921	-13.2089	1.0	10.2168	-1367.6129	-360.3922	-1.6672	-2.8403
0.0008	0.5	42000	0.0013	-2.3107	-11.8754	1.0	9.5647	-1234.2609	-292.2454	-1.6475	-2.8400
0.0024	0.54	45000	0.0010	-3.3769	-13.2333	1.0	9.8564	-1370.0538	-398.8731	-1.6937	-2.8403
0.0019	0.57	48000	0.0013	-2.8151	-12.4427	1.0	9.6277	-1290.9999	-342.6892	-1.7047	-2.8503
0.2266	0.61	51000	0.0014	-1.9532	-11.0212	1.0	9.0680	-1148.8468	-256.4992	-1.6745	-2.8650
0.0016	0.65	54000	0.0014	-1.8077	-10.7512	1.0	8.9435	-1121.8423	-241.9466	-1.8328	-2.8946
0.0019	0.68	57000	0.0013	-1.8159	-10.8808	1.0	9.0649	-1134.8024	-242.7715	-1.7644	-2.8860
0.0013	0.72	60000	0.0013	-1.7356	-10.8007	1.0	9.0651	-1126.8002	-234.7419	-1.7574	-2.8871
0.0014	0.75	63000	0.0013	-1.8249	-10.9773	1.0	9.1524	-1144.4586	-243.6743	-1.7699	-2.8867
0.0014	0.79	66000	0.0013	-1.8308	-10.9698	1.0	9.1389	-1143.7017	-244.2651	-1.7597	-2.8841
0.0011	0.83	69000	0.0013	-1.8034	-10.9390	1.0	9.1356	-1140.6276	-241.5220	-1.7619	-2.8858
0.0016	0.86	72000	0.0013	-1.7971	-10.9097	1.0	9.1126	-1137.6914	-240.8868	-1.7608	-2.8852
0.0239	0.9	75000	0.0013	-1.7976	-10.9400	1.0	9.1424	-1140.7238	-240.9355	-1.7773	-2.8872
0.0024	0.93	78000	0.0013	-1.7862	-10.9196	1.0	9.1334	-1138.6901	-239.8036	-1.7733	-2.8861
0.0018	0.97	81000	0.0013	-1.8228	-10.9802	1.0	9.1574	-1144.7491	-243.4639	-1.7594	-2.8860

Framework versions

PEFT 0.7.1
Transformers 4.36.2
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2

EllieS
/

zephyr-7b-dpo-lora-pubmedqa-mix2

zephyr-7b-dpo-lora-pubmedqa-mix2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for EllieS/zephyr-7b-dpo-lora-pubmedqa-mix2

Dataset used to train EllieS/zephyr-7b-dpo-lora-pubmedqa-mix2

Evaluation results