shenxq's picture
End of training
193ff66 verified
|
raw
history blame
No virus
4.41 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset
base_model: mistralai/Mistral-7B-Instruct-v0.2
model-index:
  - name: zephyr-7b-dpo-qlora-pairrm
    results: []

zephyr-7b-dpo-qlora-pairrm

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6773
  • Rewards/chosen: -1.5442
  • Rewards/rejected: -1.6837
  • Rewards/accuracies: 0.5687
  • Rewards/margins: 0.1395
  • Logps/rejected: -394.7031
  • Logps/chosen: -375.1367
  • Logits/rejected: -4.4436
  • Logits/chosen: -4.4568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6909 0.08 100 0.6921 -0.0169 -0.0192 0.5427 0.0023 -228.2522 -222.4038 -2.6219 -2.6247
0.684 0.16 200 0.6873 -0.0803 -0.0944 0.5567 0.0141 -235.7721 -228.7468 -2.7599 -2.7629
0.6795 0.24 300 0.6839 -0.5138 -0.5516 0.5460 0.0378 -281.4856 -272.0929 -3.5141 -3.5199
0.6561 0.32 400 0.6812 -0.8158 -0.8788 0.5573 0.0630 -314.2105 -302.2954 -3.7484 -3.7580
0.633 0.4 500 0.6787 -0.9027 -0.9810 0.5597 0.0782 -324.4269 -310.9858 -4.0978 -4.1077
0.6302 0.48 600 0.6785 -1.1692 -1.2692 0.5597 0.1000 -353.2493 -337.6355 -4.4318 -4.4435
0.5743 0.56 700 0.6835 -1.5435 -1.6640 0.5630 0.1205 -392.7273 -375.0575 -4.5047 -4.5182
0.6443 0.64 800 0.6779 -1.3860 -1.5069 0.5667 0.1209 -377.0208 -359.3108 -4.2453 -4.2572
0.6651 0.72 900 0.6819 -1.6633 -1.8040 0.5693 0.1408 -406.7332 -387.0414 -4.6039 -4.6178
0.5993 0.8 1000 0.6785 -1.5776 -1.7191 0.5683 0.1415 -398.2364 -378.4713 -4.5356 -4.5491
0.6759 0.88 1100 0.6778 -1.5515 -1.6923 0.5687 0.1408 -395.5604 -375.8654 -4.4722 -4.4855
0.6402 0.96 1200 0.6773 -1.5439 -1.6837 0.5690 0.1398 -394.7002 -375.1028 -4.4444 -4.4577

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.14.6
  • Tokenizers 0.15.0