metadata

license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
base_model: mistralai/Mistral-7B-Instruct-v0.2
model-index:
  - name: zephyr-7b-dpo-qlora-pairrm
    results: []

zephyr-7b-dpo-qlora-pairrm

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset. It achieves the following results on the evaluation set:

Loss: 0.6773
Rewards/chosen: -1.5442
Rewards/rejected: -1.6837
Rewards/accuracies: 0.5687
Rewards/margins: 0.1395
Logps/rejected: -394.7031
Logps/chosen: -375.1367
Logits/rejected: -4.4436
Logits/chosen: -4.4568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6909	0.08	100	0.6921	-0.0169	-0.0192	0.5427	0.0023	-228.2522	-222.4038	-2.6219	-2.6247
0.684	0.16	200	0.6873	-0.0803	-0.0944	0.5567	0.0141	-235.7721	-228.7468	-2.7599	-2.7629
0.6795	0.24	300	0.6839	-0.5138	-0.5516	0.5460	0.0378	-281.4856	-272.0929	-3.5141	-3.5199
0.6561	0.32	400	0.6812	-0.8158	-0.8788	0.5573	0.0630	-314.2105	-302.2954	-3.7484	-3.7580
0.633	0.4	500	0.6787	-0.9027	-0.9810	0.5597	0.0782	-324.4269	-310.9858	-4.0978	-4.1077
0.6302	0.48	600	0.6785	-1.1692	-1.2692	0.5597	0.1000	-353.2493	-337.6355	-4.4318	-4.4435
0.5743	0.56	700	0.6835	-1.5435	-1.6640	0.5630	0.1205	-392.7273	-375.0575	-4.5047	-4.5182
0.6443	0.64	800	0.6779	-1.3860	-1.5069	0.5667	0.1209	-377.0208	-359.3108	-4.2453	-4.2572
0.6651	0.72	900	0.6819	-1.6633	-1.8040	0.5693	0.1408	-406.7332	-387.0414	-4.6039	-4.6178
0.5993	0.8	1000	0.6785	-1.5776	-1.7191	0.5683	0.1415	-398.2364	-378.4713	-4.5356	-4.5491
0.6759	0.88	1100	0.6778	-1.5515	-1.6923	0.5687	0.1408	-395.5604	-375.8654	-4.4722	-4.4855
0.6402	0.96	1200	0.6773	-1.5439	-1.6837	0.5690	0.1398	-394.7002	-375.1028	-4.4444	-4.4577

Framework versions

PEFT 0.7.1
Transformers 4.36.2
Pytorch 2.1.2
Datasets 2.14.6
Tokenizers 0.15.0