zephyr-7b-dpo-oursuf6k-qlora-5e-6
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the generation/UF6k dataset. It achieves the following results on the evaluation set:
- Loss: 0.5972
- Rewards/chosen: -2.1562
- Rewards/rejected: -2.9617
- Rewards/accuracies: 0.6865
- Rewards/margins: 0.8055
- Rewards/margins Max: 2.4253
- Rewards/margins Min: -0.7592
- Rewards/margins Std: 1.4237
- Logps/rejected: -555.3531
- Logps/chosen: -500.8411
- Logits/rejected: -1.8009
- Logits/chosen: -1.8449
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 16
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Rewards/margins Max | Rewards/margins Min | Rewards/margins Std | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.5821 | 0.15 | 100 | 0.6622 | -0.1867 | -0.2755 | 0.6151 | 0.0888 | 0.3953 | -0.1795 | 0.2588 | -286.7321 | -303.8942 | -2.6966 | -2.7355 |
0.481 | 0.29 | 200 | 0.6257 | -1.2575 | -1.6473 | 0.6746 | 0.3898 | 1.3412 | -0.5259 | 0.8283 | -423.9109 | -410.9716 | -2.5402 | -2.5661 |
0.4017 | 0.44 | 300 | 0.6112 | -1.7680 | -2.5016 | 0.6944 | 0.7336 | 2.3329 | -0.8123 | 1.4011 | -509.3477 | -462.0217 | -1.9880 | -2.0224 |
0.3427 | 0.58 | 400 | 0.5955 | -1.9140 | -2.6859 | 0.7024 | 0.7719 | 2.2721 | -0.7218 | 1.3401 | -527.7765 | -476.6219 | -1.9447 | -1.9863 |
0.3246 | 0.73 | 500 | 0.6026 | -2.2815 | -3.0194 | 0.6627 | 0.7379 | 2.2879 | -0.7821 | 1.3716 | -561.1234 | -513.3748 | -1.8444 | -1.8864 |
0.2747 | 0.88 | 600 | 0.5973 | -2.1734 | -2.9762 | 0.6786 | 0.8029 | 2.4273 | -0.7515 | 1.4233 | -556.8073 | -502.5607 | -1.7934 | -1.8380 |
Framework versions
- PEFT 0.7.1
- Transformers 4.39.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2
- Downloads last month
- 2
Model tree for just1nseo/zephyr-7b-dpo-oursuf6k-qlora-5e-6
Base model
mistralai/Mistral-7B-v0.1
Finetuned
alignment-handbook/zephyr-7b-sft-full