tsavage68's picture
End of training
388e3e3 verified
metadata
license: apache-2.0
base_model: tsavage68/Summary4500_M2_200steps_1e7rate_SFT
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: Hyponatremia_M2_1000steps_1e7rate_05beta_CSFTDPO
    results: []

Hyponatremia_M2_1000steps_1e7rate_05beta_CSFTDPO

This model is a fine-tuned version of tsavage68/Summary4500_M2_200steps_1e7rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0014
  • Rewards/chosen: -1.1967
  • Rewards/rejected: -19.9702
  • Rewards/accuracies: 0.9980
  • Rewards/margins: 18.7736
  • Logps/rejected: -192.6703
  • Logps/chosen: -96.1331
  • Logits/rejected: -2.2591
  • Logits/chosen: -2.2130

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1654 0.0112 50 0.1737 -0.0508 -1.9689 0.9980 1.9181 -156.6675 -93.8414 -2.3390 -2.2918
0.0004 0.0224 100 0.0033 -0.7167 -11.2258 0.9980 10.5091 -175.1814 -95.1731 -2.2885 -2.2417
0.0 0.0336 150 0.0021 -1.0525 -14.0087 0.9980 12.9562 -180.7471 -95.8447 -2.2792 -2.2326
0.0 0.0448 200 0.0015 -0.9521 -16.6286 0.9980 15.6764 -185.9869 -95.6440 -2.2677 -2.2212
0.0 0.0559 250 0.0015 -0.9657 -17.2380 0.9980 16.2723 -187.2058 -95.6713 -2.2669 -2.2206
0.0 0.0671 300 0.0015 -0.9637 -17.2446 0.9980 16.2809 -187.2190 -95.6673 -2.2665 -2.2201
0.0 0.0783 350 0.0015 -1.1980 -18.6860 0.9980 17.4880 -190.1018 -96.1359 -2.2620 -2.2159
0.0001 0.0895 400 0.0014 -1.2301 -19.6059 0.9980 18.3757 -191.9415 -96.2000 -2.2577 -2.2117
0.0 0.1007 450 0.0015 -1.2380 -19.6415 0.9980 18.4035 -192.0128 -96.2158 -2.2573 -2.2113
0.0 0.1119 500 0.0014 -1.2365 -19.6568 0.9980 18.4203 -192.0434 -96.2128 -2.2581 -2.2121
0.0 0.1231 550 0.0014 -1.2308 -19.8868 0.9980 18.6559 -192.5033 -96.2015 -2.2587 -2.2127
0.0 0.1343 600 0.0014 -1.2131 -19.8634 0.9980 18.6504 -192.4567 -96.1659 -2.2581 -2.2121
0.0 0.1454 650 0.0014 -1.1869 -19.8805 0.9980 18.6936 -192.4907 -96.1136 -2.2606 -2.2145
0.0 0.1566 700 0.0014 -1.2139 -19.9693 0.9980 18.7554 -192.6684 -96.1675 -2.2588 -2.2127
0.0 0.1678 750 0.0014 -1.1965 -19.9802 0.9980 18.7837 -192.6902 -96.1328 -2.2595 -2.2134
0.0 0.1790 800 0.0014 -1.1843 -19.9036 0.9980 18.7193 -192.5370 -96.1084 -2.2606 -2.2145
0.0 0.1902 850 0.0014 -1.1914 -19.9692 0.9980 18.7778 -192.6682 -96.1225 -2.2591 -2.2130
0.0 0.2014 900 0.0014 -1.1979 -19.9798 0.9980 18.7819 -192.6894 -96.1356 -2.2589 -2.2128
0.0 0.2126 950 0.0014 -1.1962 -19.9695 0.9980 18.7733 -192.6688 -96.1321 -2.2591 -2.2130
0.0 0.2238 1000 0.0014 -1.1967 -19.9702 0.9980 18.7736 -192.6703 -96.1331 -2.2591 -2.2130

Framework versions

  • Transformers 4.42.4
  • Pytorch 2.0.0+cu117
  • Datasets 2.20.0
  • Tokenizers 0.19.1