---
license: apache-2.0
base_model: tsavage68/Summary4500_M2_200steps_1e7rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Hyponatremia_M2_1000steps_1e7rate_05beta_CSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Hyponatremia_M2_1000steps_1e7rate_05beta_CSFTDPO

This model is a fine-tuned version of [tsavage68/Summary4500_M2_200steps_1e7rate_SFT](https://huggingface.co/tsavage68/Summary4500_M2_200steps_1e7rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0014
- Rewards/chosen: -1.1967
- Rewards/rejected: -19.9702
- Rewards/accuracies: 0.9980
- Rewards/margins: 18.7736
- Logps/rejected: -192.6703
- Logps/chosen: -96.1331
- Logits/rejected: -2.2591
- Logits/chosen: -2.2130

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.1654        | 0.0112 | 50   | 0.1737          | -0.0508        | -1.9689          | 0.9980             | 1.9181          | -156.6675      | -93.8414     | -2.3390         | -2.2918       |
| 0.0004        | 0.0224 | 100  | 0.0033          | -0.7167        | -11.2258         | 0.9980             | 10.5091         | -175.1814      | -95.1731     | -2.2885         | -2.2417       |
| 0.0           | 0.0336 | 150  | 0.0021          | -1.0525        | -14.0087         | 0.9980             | 12.9562         | -180.7471      | -95.8447     | -2.2792         | -2.2326       |
| 0.0           | 0.0448 | 200  | 0.0015          | -0.9521        | -16.6286         | 0.9980             | 15.6764         | -185.9869      | -95.6440     | -2.2677         | -2.2212       |
| 0.0           | 0.0559 | 250  | 0.0015          | -0.9657        | -17.2380         | 0.9980             | 16.2723         | -187.2058      | -95.6713     | -2.2669         | -2.2206       |
| 0.0           | 0.0671 | 300  | 0.0015          | -0.9637        | -17.2446         | 0.9980             | 16.2809         | -187.2190      | -95.6673     | -2.2665         | -2.2201       |
| 0.0           | 0.0783 | 350  | 0.0015          | -1.1980        | -18.6860         | 0.9980             | 17.4880         | -190.1018      | -96.1359     | -2.2620         | -2.2159       |
| 0.0001        | 0.0895 | 400  | 0.0014          | -1.2301        | -19.6059         | 0.9980             | 18.3757         | -191.9415      | -96.2000     | -2.2577         | -2.2117       |
| 0.0           | 0.1007 | 450  | 0.0015          | -1.2380        | -19.6415         | 0.9980             | 18.4035         | -192.0128      | -96.2158     | -2.2573         | -2.2113       |
| 0.0           | 0.1119 | 500  | 0.0014          | -1.2365        | -19.6568         | 0.9980             | 18.4203         | -192.0434      | -96.2128     | -2.2581         | -2.2121       |
| 0.0           | 0.1231 | 550  | 0.0014          | -1.2308        | -19.8868         | 0.9980             | 18.6559         | -192.5033      | -96.2015     | -2.2587         | -2.2127       |
| 0.0           | 0.1343 | 600  | 0.0014          | -1.2131        | -19.8634         | 0.9980             | 18.6504         | -192.4567      | -96.1659     | -2.2581         | -2.2121       |
| 0.0           | 0.1454 | 650  | 0.0014          | -1.1869        | -19.8805         | 0.9980             | 18.6936         | -192.4907      | -96.1136     | -2.2606         | -2.2145       |
| 0.0           | 0.1566 | 700  | 0.0014          | -1.2139        | -19.9693         | 0.9980             | 18.7554         | -192.6684      | -96.1675     | -2.2588         | -2.2127       |
| 0.0           | 0.1678 | 750  | 0.0014          | -1.1965        | -19.9802         | 0.9980             | 18.7837         | -192.6902      | -96.1328     | -2.2595         | -2.2134       |
| 0.0           | 0.1790 | 800  | 0.0014          | -1.1843        | -19.9036         | 0.9980             | 18.7193         | -192.5370      | -96.1084     | -2.2606         | -2.2145       |
| 0.0           | 0.1902 | 850  | 0.0014          | -1.1914        | -19.9692         | 0.9980             | 18.7778         | -192.6682      | -96.1225     | -2.2591         | -2.2130       |
| 0.0           | 0.2014 | 900  | 0.0014          | -1.1979        | -19.9798         | 0.9980             | 18.7819         | -192.6894      | -96.1356     | -2.2589         | -2.2128       |
| 0.0           | 0.2126 | 950  | 0.0014          | -1.1962        | -19.9695         | 0.9980             | 18.7733         | -192.6688      | -96.1321     | -2.2591         | -2.2130       |
| 0.0           | 0.2238 | 1000 | 0.0014          | -1.1967        | -19.9702         | 0.9980             | 18.7736         | -192.6703      | -96.1331     | -2.2591         | -2.2130       |


### Framework versions

- Transformers 4.42.4
- Pytorch 2.0.0+cu117
- Datasets 2.20.0
- Tokenizers 0.19.1