---
base_model: princeton-nlp/Llama-3-Base-8B-SFT
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: llama3-wpo-lora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# llama3-wpo-lora

This model is a fine-tuned version of [princeton-nlp/Llama-3-Base-8B-SFT](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5172
- Rewards/chosen: -0.0494
- Rewards/rejected: -0.9056
- Rewards/accuracies: 0.7300
- Rewards/margins: 0.8562
- Logps/rejected: -285.7321
- Logps/chosen: -293.0410
- Logps/ref Response: -0.5364
- Logits/rejected: -0.3074
- Logits/chosen: -0.3445

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 1
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/ref Response | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:------------------:|:---------------:|:-------------:|
| 0.6142        | 0.1047 | 100  | 0.5973          | 0.2024         | -0.1309          | 0.7020             | 0.3333          | -277.9861      | -290.5232    | -0.5364            | -0.5487         | -0.5543       |
| 0.5579        | 0.2094 | 200  | 0.5483          | -0.0751        | -0.7065          | 0.7120             | 0.6313          | -283.7411      | -293.2985    | -0.5364            | -0.4847         | -0.5042       |
| 0.5402        | 0.3141 | 300  | 0.5354          | -0.1318        | -0.8578          | 0.7260             | 0.7260          | -285.2545      | -293.8653    | -0.5364            | -0.4387         | -0.4637       |
| 0.5112        | 0.4187 | 400  | 0.5277          | -0.1698        | -0.9670          | 0.7220             | 0.7973          | -286.3469      | -294.2450    | -0.5364            | -0.3715         | -0.4030       |
| 0.5319        | 0.5234 | 500  | 0.5212          | -0.1546        | -0.9783          | 0.7260             | 0.8237          | -286.4595      | -294.0932    | -0.5364            | -0.3377         | -0.3727       |
| 0.5155        | 0.6281 | 600  | 0.5195          | -0.0851        | -0.9285          | 0.7360             | 0.8434          | -285.9612      | -293.3980    | -0.5364            | -0.3247         | -0.3608       |
| 0.5113        | 0.7328 | 700  | 0.5173          | -0.1941        | -1.0489          | 0.7340             | 0.8547          | -287.1652      | -294.4885    | -0.5364            | -0.3036         | -0.3411       |
| 0.5268        | 0.8375 | 800  | 0.5177          | -0.0457        | -0.9023          | 0.7220             | 0.8566          | -285.7000      | -293.0044    | -0.5364            | -0.3082         | -0.3453       |
| 0.4923        | 0.9422 | 900  | 0.5175          | -0.0517        | -0.9092          | 0.7280             | 0.8575          | -285.7691      | -293.0645    | -0.5364            | -0.3072         | -0.3443       |


### Framework versions

- PEFT 0.7.1
- Transformers 4.44.2
- Pytorch 2.2.1+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1