File size: 1,029 Bytes
152670b
f8d8203
 
 
152670b
f8d8203
 
 
 
 
 
 
 
152670b
 
f8d8203
152670b
f8d8203
152670b
f8d8203
152670b
f8d8203
 
152670b
f8d8203
 
152670b
f8d8203
152670b
f8d8203
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- orpo
- trl
datasets:
- alvarobartt/dpo-mix-7k-simplified
base_model: mistralai/Mistral-7B-v0.1
pipeline_tag: text-generation
inference: false
---

## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/60f0608166e5701b80ed3f02/hRyhnTySt-KQ0gnnoclSm.jpeg)

> Stable Diffusion XL "A capybara, a killer whale, and a robot named Ultra being friends"

This is an ORPO fine-tune of [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
[`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).

⚠️ Note that the code is still experimental, as the `ORPOTrainer` PR is still not merged, follow its progress
at [🤗`trl` - `ORPOTrainer` PR](https://github.com/huggingface/trl/pull/1435).

## Reference

[`ORPO: Monolithic Preference Optimization without Reference Model`](https://huggingface.co/papers/2403.07691)