phi-1_5-dpo
This model is a fine-tuned version of rasyosef/phi-1_5-sft on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5013
- Rewards/chosen: -1.0250
- Rewards/rejected: -2.3893
- Rewards/accuracies: 0.7283
- Rewards/margins: 1.3643
- Logps/rejected: -162.0916
- Logps/chosen: -128.1033
- Logits/rejected: 5.3082
- Logits/chosen: 5.1890
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 300
- num_epochs: 3
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Rewards/chosen |
Rewards/rejected |
Rewards/accuracies |
Rewards/margins |
Logps/rejected |
Logps/chosen |
Logits/rejected |
Logits/chosen |
0.6899 |
0.1241 |
138 |
0.6769 |
-0.0153 |
-0.0504 |
0.625 |
0.0351 |
-138.7025 |
-118.0066 |
4.5710 |
4.4532 |
0.6309 |
0.2482 |
276 |
0.6035 |
-0.2012 |
-0.5586 |
0.7120 |
0.3575 |
-143.7850 |
-119.8655 |
4.5167 |
4.3940 |
0.5756 |
0.3723 |
414 |
0.5669 |
-0.3693 |
-0.9842 |
0.7174 |
0.6149 |
-148.0405 |
-121.5467 |
4.6242 |
4.5060 |
0.5715 |
0.4964 |
552 |
0.5446 |
-0.4109 |
-1.1855 |
0.7283 |
0.7745 |
-150.0534 |
-121.9633 |
4.7324 |
4.6143 |
0.5449 |
0.6205 |
690 |
0.5331 |
-0.4666 |
-1.3090 |
0.7446 |
0.8424 |
-151.2884 |
-122.5196 |
4.8229 |
4.7080 |
0.5536 |
0.7446 |
828 |
0.5136 |
-0.4885 |
-1.3825 |
0.7446 |
0.8940 |
-152.0234 |
-122.7389 |
4.8867 |
4.7737 |
0.5253 |
0.8687 |
966 |
0.5057 |
-0.5613 |
-1.5446 |
0.7554 |
0.9832 |
-153.6442 |
-123.4672 |
4.9287 |
4.8080 |
0.5249 |
0.9928 |
1104 |
0.5054 |
-0.5101 |
-1.4656 |
0.75 |
0.9555 |
-152.8544 |
-122.9549 |
4.8704 |
4.7521 |
0.4631 |
1.1169 |
1242 |
0.5067 |
-0.6889 |
-1.7678 |
0.75 |
1.0789 |
-155.8768 |
-124.7426 |
4.8470 |
4.7276 |
0.4524 |
1.2410 |
1380 |
0.5006 |
-0.7467 |
-1.9049 |
0.7446 |
1.1582 |
-157.2474 |
-125.3205 |
4.9447 |
4.8239 |
0.424 |
1.3651 |
1518 |
0.5036 |
-0.7638 |
-2.0144 |
0.7337 |
1.2505 |
-158.3425 |
-125.4923 |
4.9235 |
4.8002 |
0.4428 |
1.4892 |
1656 |
0.5004 |
-0.7790 |
-2.0132 |
0.7446 |
1.2342 |
-158.3307 |
-125.6437 |
4.9576 |
4.8375 |
0.4424 |
1.6133 |
1794 |
0.4944 |
-0.8220 |
-2.0517 |
0.7391 |
1.2297 |
-158.7152 |
-126.0739 |
4.9736 |
4.8553 |
0.4358 |
1.7374 |
1932 |
0.5022 |
-0.8091 |
-1.9993 |
0.7228 |
1.1902 |
-158.1918 |
-125.9447 |
5.0894 |
4.9702 |
0.4426 |
1.8615 |
2070 |
0.4992 |
-0.8254 |
-2.0308 |
0.7228 |
1.2054 |
-158.5065 |
-126.1077 |
5.0943 |
4.9780 |
0.4226 |
1.9856 |
2208 |
0.4971 |
-0.8701 |
-2.1434 |
0.7283 |
1.2733 |
-159.6329 |
-126.5553 |
5.1222 |
5.0011 |
0.3684 |
2.1097 |
2346 |
0.5032 |
-0.9201 |
-2.2281 |
0.7228 |
1.3081 |
-160.4799 |
-127.0545 |
5.2209 |
5.1031 |
0.3695 |
2.2338 |
2484 |
0.5022 |
-0.9332 |
-2.2651 |
0.7228 |
1.3319 |
-160.8495 |
-127.1860 |
5.2170 |
5.0977 |
0.3693 |
2.3579 |
2622 |
0.5022 |
-0.9418 |
-2.2839 |
0.7283 |
1.3421 |
-161.0379 |
-127.2717 |
5.2390 |
5.1169 |
0.3659 |
2.4820 |
2760 |
0.5037 |
-0.9820 |
-2.3392 |
0.7228 |
1.3572 |
-161.5908 |
-127.6742 |
5.2392 |
5.1148 |
0.3557 |
2.6061 |
2898 |
0.5031 |
-1.0001 |
-2.3531 |
0.7228 |
1.3529 |
-161.7294 |
-127.8552 |
5.2704 |
5.1488 |
0.3491 |
2.7302 |
3036 |
0.5053 |
-1.0242 |
-2.3803 |
0.7228 |
1.3562 |
-162.0017 |
-128.0954 |
5.2880 |
5.1693 |
0.3512 |
2.8543 |
3174 |
0.5036 |
-1.0265 |
-2.3833 |
0.7174 |
1.3568 |
-162.0320 |
-128.1190 |
5.2965 |
5.1768 |
0.3458 |
2.9784 |
3312 |
0.5013 |
-1.0250 |
-2.3893 |
0.7283 |
1.3643 |
-162.0916 |
-128.1033 |
5.3082 |
5.1890 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.4
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1