--- license: llama3 base_model: tsavage68/UTI_L3_1000steps_1e5rate_SFT tags: - trl - dpo - generated_from_trainer model-index: - name: UTI2_L3_1000steps_1e7rate_01beta_CSFTDPO results: [] --- # UTI2_L3_1000steps_1e7rate_01beta_CSFTDPO This model is a fine-tuned version of [tsavage68/UTI_L3_1000steps_1e5rate_SFT](https://huggingface.co/tsavage68/UTI_L3_1000steps_1e5rate_SFT) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.3201 - Rewards/chosen: 0.3631 - Rewards/rejected: -1.0607 - Rewards/accuracies: 0.6500 - Rewards/margins: 1.4238 - Logps/rejected: -39.0917 - Logps/chosen: -15.4719 - Logits/rejected: -1.1656 - Logits/chosen: -1.1559 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-07 - train_batch_size: 2 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6928 | 0.3333 | 25 | 0.6924 | 0.0009 | -0.0007 | 0.3600 | 0.0016 | -28.4922 | -19.0947 | -1.1524 | -1.1488 | | 0.6893 | 0.6667 | 50 | 0.6863 | 0.0103 | -0.0035 | 0.6100 | 0.0138 | -28.5194 | -19.0000 | -1.1524 | -1.1488 | | 0.6736 | 1.0 | 75 | 0.6701 | 0.0321 | -0.0151 | 0.6300 | 0.0471 | -28.6352 | -18.7825 | -1.1527 | -1.1490 | | 0.622 | 1.3333 | 100 | 0.6366 | 0.0753 | -0.0439 | 0.6400 | 0.1192 | -28.9234 | -18.3503 | -1.1534 | -1.1493 | | 0.5805 | 1.6667 | 125 | 0.5938 | 0.1241 | -0.0945 | 0.6500 | 0.2187 | -29.4300 | -17.8620 | -1.1538 | -1.1494 | | 0.5772 | 2.0 | 150 | 0.5571 | 0.1610 | -0.1510 | 0.6500 | 0.3120 | -29.9945 | -17.4931 | -1.1547 | -1.1499 | | 0.516 | 2.3333 | 175 | 0.5241 | 0.1902 | -0.2143 | 0.6500 | 0.4044 | -30.6273 | -17.2015 | -1.1556 | -1.1505 | | 0.4638 | 2.6667 | 200 | 0.4925 | 0.2168 | -0.2852 | 0.6500 | 0.5020 | -31.3371 | -16.9357 | -1.1559 | -1.1505 | | 0.4365 | 3.0 | 225 | 0.4649 | 0.2446 | -0.3517 | 0.6500 | 0.5963 | -32.0016 | -16.6578 | -1.1566 | -1.1508 | | 0.4083 | 3.3333 | 250 | 0.4422 | 0.2622 | -0.4193 | 0.6500 | 0.6815 | -32.6772 | -16.4813 | -1.1577 | -1.1516 | | 0.3553 | 3.6667 | 275 | 0.4223 | 0.2800 | -0.4859 | 0.6500 | 0.7659 | -33.3439 | -16.3032 | -1.1584 | -1.1520 | | 0.4039 | 4.0 | 300 | 0.4063 | 0.2911 | -0.5469 | 0.6500 | 0.8380 | -33.9535 | -16.1919 | -1.1592 | -1.1526 | | 0.3674 | 4.3333 | 325 | 0.3920 | 0.3016 | -0.6087 | 0.6500 | 0.9103 | -34.5715 | -16.0876 | -1.1606 | -1.1537 | | 0.2812 | 4.6667 | 350 | 0.3792 | 0.3135 | -0.6673 | 0.6500 | 0.9808 | -35.1574 | -15.9683 | -1.1613 | -1.1541 | | 0.3317 | 5.0 | 375 | 0.3685 | 0.3227 | -0.7208 | 0.6500 | 1.0434 | -35.6923 | -15.8766 | -1.1616 | -1.1541 | | 0.325 | 5.3333 | 400 | 0.3591 | 0.3264 | -0.7757 | 0.6500 | 1.1021 | -36.2415 | -15.8395 | -1.1621 | -1.1544 | | 0.3158 | 5.6667 | 425 | 0.3525 | 0.3330 | -0.8164 | 0.6500 | 1.1494 | -36.6489 | -15.7737 | -1.1630 | -1.1550 | | 0.2902 | 6.0 | 450 | 0.3457 | 0.3390 | -0.8602 | 0.6500 | 1.1992 | -37.0867 | -15.7133 | -1.1632 | -1.1549 | | 0.343 | 6.3333 | 475 | 0.3412 | 0.3436 | -0.8920 | 0.6500 | 1.2356 | -37.4049 | -15.6674 | -1.1637 | -1.1553 | | 0.3655 | 6.6667 | 500 | 0.3365 | 0.3468 | -0.9263 | 0.6500 | 1.2731 | -37.7472 | -15.6348 | -1.1639 | -1.1552 | | 0.2822 | 7.0 | 525 | 0.3326 | 0.3524 | -0.9533 | 0.6500 | 1.3057 | -38.0177 | -15.5791 | -1.1644 | -1.1556 | | 0.2526 | 7.3333 | 550 | 0.3298 | 0.3555 | -0.9743 | 0.6500 | 1.3299 | -38.2280 | -15.5482 | -1.1647 | -1.1557 | | 0.318 | 7.6667 | 575 | 0.3275 | 0.3569 | -0.9949 | 0.6500 | 1.3517 | -38.4333 | -15.5346 | -1.1645 | -1.1554 | | 0.3145 | 8.0 | 600 | 0.3255 | 0.3586 | -1.0129 | 0.6500 | 1.3715 | -38.6135 | -15.5168 | -1.1652 | -1.1559 | | 0.2851 | 8.3333 | 625 | 0.3241 | 0.3589 | -1.0262 | 0.6500 | 1.3852 | -38.7470 | -15.5140 | -1.1652 | -1.1558 | | 0.1756 | 8.6667 | 650 | 0.3228 | 0.3600 | -1.0375 | 0.6500 | 1.3975 | -38.8595 | -15.5034 | -1.1652 | -1.1557 | | 0.2868 | 9.0 | 675 | 0.3217 | 0.3607 | -1.0476 | 0.6500 | 1.4083 | -38.9610 | -15.4963 | -1.1654 | -1.1559 | | 0.2786 | 9.3333 | 700 | 0.3209 | 0.3622 | -1.0522 | 0.6500 | 1.4143 | -39.0064 | -15.4818 | -1.1654 | -1.1558 | | 0.2804 | 9.6667 | 725 | 0.3208 | 0.3616 | -1.0562 | 0.6500 | 1.4178 | -39.0471 | -15.4874 | -1.1654 | -1.1558 | | 0.3682 | 10.0 | 750 | 0.3205 | 0.3614 | -1.0595 | 0.6500 | 1.4209 | -39.0792 | -15.4894 | -1.1655 | -1.1559 | | 0.2618 | 10.3333 | 775 | 0.3205 | 0.3604 | -1.0603 | 0.6500 | 1.4208 | -39.0879 | -15.4988 | -1.1653 | -1.1556 | | 0.2804 | 10.6667 | 800 | 0.3206 | 0.3617 | -1.0597 | 0.6500 | 1.4214 | -39.0821 | -15.4862 | -1.1653 | -1.1557 | | 0.3001 | 11.0 | 825 | 0.3203 | 0.3631 | -1.0587 | 0.6500 | 1.4218 | -39.0720 | -15.4725 | -1.1653 | -1.1556 | | 0.3397 | 11.3333 | 850 | 0.3201 | 0.3635 | -1.0606 | 0.6500 | 1.4241 | -39.0906 | -15.4681 | -1.1655 | -1.1558 | | 0.2398 | 11.6667 | 875 | 0.3202 | 0.3612 | -1.0617 | 0.6500 | 1.4229 | -39.1017 | -15.4914 | -1.1653 | -1.1557 | | 0.2255 | 12.0 | 900 | 0.3201 | 0.3629 | -1.0600 | 0.6500 | 1.4229 | -39.0848 | -15.4745 | -1.1656 | -1.1560 | | 0.2491 | 12.3333 | 925 | 0.3201 | 0.3642 | -1.0596 | 0.6500 | 1.4237 | -39.0803 | -15.4615 | -1.1656 | -1.1559 | | 0.2946 | 12.6667 | 950 | 0.3201 | 0.3631 | -1.0607 | 0.6500 | 1.4238 | -39.0917 | -15.4719 | -1.1656 | -1.1559 | | 0.2648 | 13.0 | 975 | 0.3201 | 0.3631 | -1.0607 | 0.6500 | 1.4238 | -39.0917 | -15.4719 | -1.1656 | -1.1559 | | 0.3553 | 13.3333 | 1000 | 0.3201 | 0.3631 | -1.0607 | 0.6500 | 1.4238 | -39.0917 | -15.4719 | -1.1656 | -1.1559 | ### Framework versions - Transformers 4.41.2 - Pytorch 2.0.0+cu117 - Datasets 2.19.2 - Tokenizers 0.19.1