--- license: apache-2.0 base_model: tsavage68/Summary4500_M2_200steps_1e7rate_SFT tags: - trl - dpo - generated_from_trainer model-index: - name: Hyponatremia_M2_1000steps_1e8rate_03beta_CSFTDPO results: [] --- # Hyponatremia_M2_1000steps_1e8rate_03beta_CSFTDPO This model is a fine-tuned version of [tsavage68/Summary4500_M2_200steps_1e7rate_SFT](https://huggingface.co/tsavage68/Summary4500_M2_200steps_1e7rate_SFT) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.6747 - Rewards/chosen: 0.0004 - Rewards/rejected: -0.0376 - Rewards/accuracies: 0.7520 - Rewards/margins: 0.0381 - Logps/rejected: -153.1061 - Logps/chosen: -93.7354 - Logits/rejected: -2.3509 - Logits/chosen: -2.3035 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-08 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6837 | 0.0112 | 50 | 0.6927 | -0.0020 | -0.0036 | 0.4960 | 0.0017 | -152.7662 | -93.7594 | -2.3523 | -2.3049 | | 0.6845 | 0.0224 | 100 | 0.6912 | 0.0015 | -0.0032 | 0.5580 | 0.0047 | -152.7614 | -93.7247 | -2.3530 | -2.3055 | | 0.6844 | 0.0336 | 150 | 0.6870 | -0.0007 | -0.0140 | 0.5640 | 0.0133 | -152.8696 | -93.7470 | -2.3520 | -2.3046 | | 0.6667 | 0.0448 | 200 | 0.6836 | -0.0018 | -0.0219 | 0.6100 | 0.0201 | -152.9489 | -93.7576 | -2.3522 | -2.3048 | | 0.6785 | 0.0559 | 250 | 0.6806 | -0.0009 | -0.0271 | 0.6940 | 0.0261 | -153.0003 | -93.7490 | -2.3522 | -2.3048 | | 0.6748 | 0.0671 | 300 | 0.6782 | -0.0019 | -0.0329 | 0.6920 | 0.0311 | -153.0590 | -93.7583 | -2.3512 | -2.3038 | | 0.6987 | 0.0783 | 350 | 0.6757 | -0.0017 | -0.0378 | 0.7140 | 0.0361 | -153.1073 | -93.7563 | -2.3508 | -2.3034 | | 0.6323 | 0.0895 | 400 | 0.6739 | -0.0008 | -0.0406 | 0.7560 | 0.0398 | -153.1360 | -93.7480 | -2.3511 | -2.3037 | | 0.6642 | 0.1007 | 450 | 0.6753 | -0.0004 | -0.0375 | 0.7340 | 0.0371 | -153.1046 | -93.7441 | -2.3508 | -2.3034 | | 0.6692 | 0.1119 | 500 | 0.6726 | -0.0014 | -0.0438 | 0.7680 | 0.0424 | -153.1682 | -93.7542 | -2.3499 | -2.3025 | | 0.6745 | 0.1231 | 550 | 0.6736 | -0.0002 | -0.0406 | 0.7320 | 0.0404 | -153.1359 | -93.7421 | -2.3513 | -2.3039 | | 0.6661 | 0.1343 | 600 | 0.6741 | -0.0001 | -0.0398 | 0.7560 | 0.0396 | -153.1274 | -93.7413 | -2.3514 | -2.3040 | | 0.6629 | 0.1454 | 650 | 0.6739 | 0.0002 | -0.0397 | 0.7400 | 0.0398 | -153.1265 | -93.7381 | -2.3518 | -2.3043 | | 0.6572 | 0.1566 | 700 | 0.6731 | -0.0007 | -0.0422 | 0.7460 | 0.0415 | -153.1519 | -93.7464 | -2.3499 | -2.3025 | | 0.6694 | 0.1678 | 750 | 0.6742 | -0.0011 | -0.0403 | 0.7380 | 0.0392 | -153.1327 | -93.7505 | -2.3509 | -2.3034 | | 0.6763 | 0.1790 | 800 | 0.6717 | 0.0021 | -0.0424 | 0.7660 | 0.0445 | -153.1538 | -93.7188 | -2.3509 | -2.3035 | | 0.669 | 0.1902 | 850 | 0.6758 | -0.0005 | -0.0364 | 0.7320 | 0.0358 | -153.0933 | -93.7452 | -2.3509 | -2.3035 | | 0.6696 | 0.2014 | 900 | 0.6747 | 0.0004 | -0.0376 | 0.7520 | 0.0381 | -153.1061 | -93.7354 | -2.3509 | -2.3035 | | 0.6593 | 0.2126 | 950 | 0.6747 | 0.0004 | -0.0376 | 0.7520 | 0.0381 | -153.1061 | -93.7354 | -2.3509 | -2.3035 | | 0.6831 | 0.2238 | 1000 | 0.6747 | 0.0004 | -0.0376 | 0.7520 | 0.0381 | -153.1061 | -93.7354 | -2.3509 | -2.3035 | ### Framework versions - Transformers 4.42.4 - Pytorch 2.0.0+cu117 - Datasets 2.20.0 - Tokenizers 0.19.1