---
license: apache-2.0
base_model: tsavage68/Summary4500_M2_200steps_1e7rate_SFT
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: Hyponatremia_M2_1000steps_1e8rate_03beta_CSFTDPO
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Hyponatremia_M2_1000steps_1e8rate_03beta_CSFTDPO

This model is a fine-tuned version of [tsavage68/Summary4500_M2_200steps_1e7rate_SFT](https://huggingface.co/tsavage68/Summary4500_M2_200steps_1e7rate_SFT) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6747
- Rewards/chosen: 0.0004
- Rewards/rejected: -0.0376
- Rewards/accuracies: 0.7520
- Rewards/margins: 0.0381
- Logps/rejected: -153.1061
- Logps/chosen: -93.7354
- Logits/rejected: -2.3509
- Logits/chosen: -2.3035

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-08
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6837        | 0.0112 | 50   | 0.6927          | -0.0020        | -0.0036          | 0.4960             | 0.0017          | -152.7662      | -93.7594     | -2.3523         | -2.3049       |
| 0.6845        | 0.0224 | 100  | 0.6912          | 0.0015         | -0.0032          | 0.5580             | 0.0047          | -152.7614      | -93.7247     | -2.3530         | -2.3055       |
| 0.6844        | 0.0336 | 150  | 0.6870          | -0.0007        | -0.0140          | 0.5640             | 0.0133          | -152.8696      | -93.7470     | -2.3520         | -2.3046       |
| 0.6667        | 0.0448 | 200  | 0.6836          | -0.0018        | -0.0219          | 0.6100             | 0.0201          | -152.9489      | -93.7576     | -2.3522         | -2.3048       |
| 0.6785        | 0.0559 | 250  | 0.6806          | -0.0009        | -0.0271          | 0.6940             | 0.0261          | -153.0003      | -93.7490     | -2.3522         | -2.3048       |
| 0.6748        | 0.0671 | 300  | 0.6782          | -0.0019        | -0.0329          | 0.6920             | 0.0311          | -153.0590      | -93.7583     | -2.3512         | -2.3038       |
| 0.6987        | 0.0783 | 350  | 0.6757          | -0.0017        | -0.0378          | 0.7140             | 0.0361          | -153.1073      | -93.7563     | -2.3508         | -2.3034       |
| 0.6323        | 0.0895 | 400  | 0.6739          | -0.0008        | -0.0406          | 0.7560             | 0.0398          | -153.1360      | -93.7480     | -2.3511         | -2.3037       |
| 0.6642        | 0.1007 | 450  | 0.6753          | -0.0004        | -0.0375          | 0.7340             | 0.0371          | -153.1046      | -93.7441     | -2.3508         | -2.3034       |
| 0.6692        | 0.1119 | 500  | 0.6726          | -0.0014        | -0.0438          | 0.7680             | 0.0424          | -153.1682      | -93.7542     | -2.3499         | -2.3025       |
| 0.6745        | 0.1231 | 550  | 0.6736          | -0.0002        | -0.0406          | 0.7320             | 0.0404          | -153.1359      | -93.7421     | -2.3513         | -2.3039       |
| 0.6661        | 0.1343 | 600  | 0.6741          | -0.0001        | -0.0398          | 0.7560             | 0.0396          | -153.1274      | -93.7413     | -2.3514         | -2.3040       |
| 0.6629        | 0.1454 | 650  | 0.6739          | 0.0002         | -0.0397          | 0.7400             | 0.0398          | -153.1265      | -93.7381     | -2.3518         | -2.3043       |
| 0.6572        | 0.1566 | 700  | 0.6731          | -0.0007        | -0.0422          | 0.7460             | 0.0415          | -153.1519      | -93.7464     | -2.3499         | -2.3025       |
| 0.6694        | 0.1678 | 750  | 0.6742          | -0.0011        | -0.0403          | 0.7380             | 0.0392          | -153.1327      | -93.7505     | -2.3509         | -2.3034       |
| 0.6763        | 0.1790 | 800  | 0.6717          | 0.0021         | -0.0424          | 0.7660             | 0.0445          | -153.1538      | -93.7188     | -2.3509         | -2.3035       |
| 0.669         | 0.1902 | 850  | 0.6758          | -0.0005        | -0.0364          | 0.7320             | 0.0358          | -153.0933      | -93.7452     | -2.3509         | -2.3035       |
| 0.6696        | 0.2014 | 900  | 0.6747          | 0.0004         | -0.0376          | 0.7520             | 0.0381          | -153.1061      | -93.7354     | -2.3509         | -2.3035       |
| 0.6593        | 0.2126 | 950  | 0.6747          | 0.0004         | -0.0376          | 0.7520             | 0.0381          | -153.1061      | -93.7354     | -2.3509         | -2.3035       |
| 0.6831        | 0.2238 | 1000 | 0.6747          | 0.0004         | -0.0376          | 0.7520             | 0.0381          | -153.1061      | -93.7354     | -2.3509         | -2.3035       |


### Framework versions

- Transformers 4.42.4
- Pytorch 2.0.0+cu117
- Datasets 2.20.0
- Tokenizers 0.19.1