---
library_name: transformers
license: llama3.2
base_model: meta-llama/Llama-3.2-3B-Instruct
tags:
- trl
- orpo
- generated_from_trainer
model-index:
- name: results
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# results

This model is a fine-tuned version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7865
- Rewards/chosen: -0.0728
- Rewards/rejected: -0.1410
- Rewards/accuracies: 1.0
- Rewards/margins: 0.0682
- Logps/rejected: -1.4102
- Logps/chosen: -0.7284
- Logits/rejected: -1.3629
- Logits/chosen: -1.0739
- Nll Loss: 0.7297
- Log Odds Ratio: -0.3156
- Log Odds Chosen: 1.0813

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 5
- num_epochs: 10

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | Nll Loss | Log Odds Ratio | Log Odds Chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|:--------:|:--------------:|:---------------:|
| No log        | 0.8   | 5    | 3.8120          | -0.2526        | -0.2936          | 1.0                | 0.0411          | -2.9364        | -2.5255      | -0.1324         | 0.0341        | 3.7999   | -0.5042        | 0.4449          |
| 4.046         | 1.6   | 10   | 2.5798          | -0.1655        | -0.2025          | 0.8333             | 0.0370          | -2.0247        | -1.6551      | -0.5901         | -0.3606       | 2.5283   | -0.4995        | 0.4487          |
| 4.046         | 2.4   | 15   | 1.7548          | -0.1310        | -0.1707          | 0.8333             | 0.0396          | -1.7065        | -1.3105      | -1.0779         | -0.8067       | 1.6757   | -0.4762        | 0.5189          |
| 1.6806        | 3.2   | 20   | 1.2964          | -0.1081        | -0.1505          | 0.8333             | 0.0423          | -1.5045        | -1.0815      | -1.2319         | -0.9478       | 1.2096   | -0.4555        | 0.5966          |
| 1.6806        | 4.0   | 25   | 1.0927          | -0.0927        | -0.1413          | 0.8333             | 0.0486          | -1.4130        | -0.9266      | -1.2313         | -0.9509       | 1.0155   | -0.4199        | 0.7223          |
| 0.9531        | 4.8   | 30   | 0.9672          | -0.0831        | -0.1381          | 1.0                | 0.0550          | -1.3815        | -0.8311      | -1.2424         | -0.9626       | 0.8961   | -0.3827        | 0.8429          |
| 0.9531        | 5.6   | 35   | 0.8865          | -0.0779        | -0.1375          | 1.0                | 0.0597          | -1.3751        | -0.7785      | -1.2870         | -0.9968       | 0.8182   | -0.3555        | 0.9335          |
| 0.7263        | 6.4   | 40   | 0.8374          | -0.0755        | -0.1388          | 1.0                | 0.0633          | -1.3876        | -0.7545      | -1.3805         | -1.0853       | 0.7755   | -0.3371        | 0.9980          |
| 0.7263        | 7.2   | 45   | 0.8076          | -0.0739        | -0.1400          | 1.0                | 0.0660          | -1.3996        | -0.7393      | -1.3674         | -1.0741       | 0.7480   | -0.3248        | 1.0448          |
| 0.6366        | 8.0   | 50   | 0.7919          | -0.0730        | -0.1405          | 1.0                | 0.0675          | -1.4052        | -0.7297      | -1.3511         | -1.0575       | 0.7335   | -0.3178        | 1.0721          |
| 0.6366        | 8.8   | 55   | 0.7878          | -0.0729        | -0.1410          | 1.0                | 0.0681          | -1.4100        | -0.7293      | -1.3573         | -1.0602       | 0.7302   | -0.3161        | 1.0787          |
| 0.6276        | 9.6   | 60   | 0.7865          | -0.0728        | -0.1410          | 1.0                | 0.0682          | -1.4102        | -0.7284      | -1.3629         | -1.0739       | 0.7297   | -0.3156        | 1.0813          |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.2.0+cu121
- Datasets 3.0.0
- Tokenizers 0.19.1