Edit model card

zephyr-dpo-qlora-uf-ours-5e-6

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2165
  • Rewards/chosen: -11.7614
  • Rewards/rejected: -13.3097
  • Rewards/accuracies: 0.6570
  • Rewards/margins: 1.5483
  • Rewards/margins Max: 7.5475
  • Rewards/margins Min: -4.6893
  • Rewards/margins Std: 4.0954
  • Logps/rejected: -1589.5470
  • Logps/chosen: -1460.7323
  • Logits/rejected: -1.6252
  • Logits/chosen: -1.6946

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Rewards/margins Max Rewards/margins Min Rewards/margins Std Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5649 0.28 100 0.6725 -0.0977 -0.1746 0.6030 0.0768 0.4635 -0.2796 0.2508 -276.0350 -294.3655 -2.6249 -2.6588
0.2267 0.56 200 0.7398 -1.8992 -2.3924 0.6440 0.4932 2.6714 -1.9000 1.5476 -497.8146 -474.5120 -1.6188 -1.6714
0.1011 0.85 300 0.9229 -7.9227 -8.9155 0.6470 0.9928 5.0516 -3.0808 2.7076 -1150.1254 -1076.8595 -1.4629 -1.5271
0.1396 1.13 400 0.9697 -8.3240 -9.5107 0.6780 1.1867 5.7375 -3.3923 3.0341 -1209.6520 -1116.9946 -1.5979 -1.6671
0.078 1.41 500 1.0425 -10.1968 -11.4586 0.6540 1.2617 6.1989 -3.7953 3.3487 -1404.4370 -1304.2783 -1.5553 -1.6255
0.0765 1.69 600 1.1715 -10.2797 -11.7639 0.6610 1.4842 7.0606 -4.5080 3.9021 -1434.9708 -1312.5632 -1.6462 -1.7167
0.0521 1.97 700 1.1039 -12.0992 -13.3377 0.6510 1.2385 6.6189 -4.0801 3.5402 -1592.3467 -1494.5151 -1.6384 -1.7083
0.0325 2.25 800 1.2214 -10.2420 -11.8359 0.6600 1.5939 7.4536 -4.7387 4.1170 -1442.1708 -1308.7980 -1.6935 -1.7630
0.0256 2.54 900 1.2020 -11.6730 -13.2282 0.6620 1.5552 7.4620 -4.6114 4.0515 -1581.3958 -1451.8892 -1.6319 -1.7014
0.0246 2.82 1000 1.2154 -11.8150 -13.3570 0.6570 1.5420 7.5369 -4.6846 4.0907 -1594.2795 -1466.0969 -1.6263 -1.6956

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.0.dev0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for just1nseo/zephyr-dpo-qlora-uf-ours-5e-6

Adapter
(136)
this model