AmberYifan's picture
End of training
c7d92f2 verified
metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
  - AmberYifan/safetyQA_DPO
model-index:
  - name: zephyr-7b-sft-safeDPO
    results: []

zephyr-7b-sft-safeDPO

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized and the AmberYifan/safetyQA_DPO datasets. It achieves the following results on the evaluation set:

  • Loss: 0.5427
  • Rewards/chosen: -2.2653
  • Rewards/rejected: -2.9944
  • Rewards/accuracies: 0.7197
  • Rewards/margins: 0.7291
  • Logps/rejected: -469.7640
  • Logps/chosen: -389.2892
  • Logits/rejected: -1.7787
  • Logits/chosen: -1.7892

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6809 0.06 100 0.6816 -0.0896 -0.1145 0.6090 0.0249 -181.7760 -171.7131 -2.5505 -2.5392
0.6002 0.12 200 0.5906 -1.4710 -1.9163 0.6844 0.4453 -361.9523 -309.8549 -2.3566 -2.3656
0.591 0.17 300 0.5809 -2.0305 -2.5280 0.6924 0.4975 -423.1273 -365.8047 -2.1933 -2.2068
0.5437 0.23 400 0.5684 -1.8038 -2.3666 0.7032 0.5628 -406.9888 -343.1347 -1.8974 -1.9247
0.5415 0.29 500 0.5648 -2.4230 -3.0650 0.7066 0.6419 -476.8223 -405.0594 -1.7952 -1.8126
0.564 0.35 600 0.5578 -2.3499 -2.9909 0.7192 0.6409 -469.4118 -397.7481 -1.8836 -1.8847
0.5769 0.4 700 0.5598 -2.2030 -2.8326 0.7032 0.6296 -453.5823 -383.0532 -1.7859 -1.7719
0.5598 0.46 800 0.5586 -2.2442 -2.8471 0.7163 0.6029 -455.0379 -387.1816 -1.7087 -1.7061
0.5374 0.52 900 0.5555 -2.1983 -2.8363 0.7152 0.6379 -453.9529 -382.5883 -1.6598 -1.6767
0.5036 0.58 1000 0.5499 -2.2315 -2.9217 0.7209 0.6902 -462.5011 -385.9115 -1.7160 -1.7254
0.5281 0.63 1100 0.5489 -2.2855 -2.9604 0.7237 0.6749 -466.3712 -391.3100 -1.7504 -1.7563
0.5067 0.69 1200 0.5448 -2.3024 -3.0075 0.7243 0.7051 -471.0760 -393.0004 -1.7967 -1.8047
0.5095 0.75 1300 0.5451 -2.2081 -2.9003 0.7186 0.6922 -460.3614 -383.5680 -1.8238 -1.8248
0.5265 0.81 1400 0.5436 -2.2844 -2.9947 0.7215 0.7103 -469.7991 -391.1993 -1.7998 -1.8071
0.4844 0.86 1500 0.5433 -2.2422 -2.9533 0.7197 0.7112 -465.6614 -386.9761 -1.7888 -1.7993
0.5612 0.92 1600 0.5427 -2.2671 -2.9948 0.7209 0.7278 -469.8113 -389.4626 -1.7766 -1.7876
0.5017 0.98 1700 0.5426 -2.2658 -2.9947 0.7215 0.7289 -469.7990 -389.3405 -1.7796 -1.7904

Framework versions

  • Transformers 4.39.0.dev0
  • Pytorch 2.3.0+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2