---
base_model: lvwerra/gpt2-imdb
tags:
- generated_from_trainer
model-index:
- name: gpt-imdb-ipo-beta_0.5
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# gpt-imdb-ipo-beta_0.5

This model is a fine-tuned version of [lvwerra/gpt2-imdb](https://huggingface.co/lvwerra/gpt2-imdb) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.9628
- Rewards/chosen: -0.4934
- Rewards/rejected: -0.8358
- Rewards/accuracies: 0.7812
- Rewards/margins: 0.3424
- Logps/rejected: -265.3568
- Logps/chosen: -236.2520
- Logits/rejected: -32.5835
- Logits/chosen: -32.6621

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 150
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 10.732        | 0.21  | 500  | 21.6330         | -0.2465        | -0.4751          | 0.5792             | 0.2286          | -264.6355      | -235.7583    | -34.3644        | -34.6229      |
| 11.0252       | 0.42  | 1000 | 17.5281         | 0.3734         | 0.1008           | 0.5437             | 0.2726          | -263.4837      | -234.5185    | -35.1543        | -35.3784      |
| 17.5294       | 0.63  | 1500 | 18.4782         | -0.4521        | -0.6725          | 0.6208             | 0.2203          | -265.0302      | -236.1696    | -33.9319        | -34.0933      |
| 7.8398        | 0.83  | 2000 | 17.4130         | -0.5472        | -0.6406          | 0.6083             | 0.0933          | -264.9664      | -236.3597    | -34.0128        | -34.1803      |
| 6.2214        | 1.04  | 2500 | 9.4072          | -0.5101        | -0.8182          | 0.6292             | 0.3080          | -265.3216      | -236.2855    | -33.2396        | -33.3578      |
| 9.8652        | 1.25  | 3000 | 13.4878         | -0.6413        | -0.8801          | 0.6375             | 0.2388          | -265.4454      | -236.5479    | -32.0018        | -32.1655      |
| 11.4779       | 1.46  | 3500 | 7.5245          | -0.0755        | -0.3944          | 0.6750             | 0.3189          | -264.4740      | -235.4162    | -32.8982        | -33.0074      |
| 3.9833        | 1.67  | 4000 | 4.4888          | -0.7021        | -1.0680          | 0.6729             | 0.3659          | -265.8214      | -236.6695    | -32.9502        | -33.0304      |
| 3.389         | 1.88  | 4500 | 3.9317          | -0.5045        | -0.8887          | 0.7271             | 0.3841          | -265.4626      | -236.2743    | -32.7817        | -32.8828      |
| 3.2338        | 2.08  | 5000 | 2.4116          | -0.5185        | -0.8672          | 0.7146             | 0.3487          | -265.4196      | -236.3022    | -32.5025        | -32.5681      |
| 1.2381        | 2.29  | 5500 | 2.1558          | -0.5066        | -0.8815          | 0.7458             | 0.3749          | -265.4483      | -236.2784    | -32.3108        | -32.3902      |
| 1.6263        | 2.5   | 6000 | 1.1972          | -0.5280        | -0.8664          | 0.7396             | 0.3384          | -265.4182      | -236.3213    | -32.5356        | -32.6104      |
| 1.0882        | 2.71  | 6500 | 1.1163          | -0.5303        | -0.8584          | 0.7562             | 0.3281          | -265.4022      | -236.3259    | -32.5615        | -32.6406      |
| 1.0559        | 2.92  | 7000 | 0.9628          | -0.4934        | -0.8358          | 0.7812             | 0.3424          | -265.3568      | -236.2520    | -32.5835        | -32.6621      |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.1
- Datasets 2.15.0
- Tokenizers 0.15.0