File size: 2,235 Bytes

e4a5b85
 
 
 
 
 
 
d23848f
 
 
 
f0d2bc6
d23848f
 
 
 
 
 
182b517
 
d23848f
 
 
 
 
 
1bde8fc
d23848f

---
license: apache-2.0
datasets:
- openbmb/UltraFeedback
language:
- en
---
This is a model released for our paper: [REBEL: Reinforcement Learning via Regressing Relative Rewards](https://arxiv.org/abs/2404.16767). 

# REBEL-Llama-3

This model is developed with REBEL based on [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) with [FsfairX-LLaMA3-RM-v0.1](https://huggingface.co/sfairXC/FsfairX-LLaMA3-RM-v0.1) as the reward model and [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset.
The training code is available at https://github.com/ZhaolinGao/REBEL.

### Links to Other Model

[REBEL-OpenChat-3.5](https://huggingface.co/Cornell-AGI/REBEL-OpenChat-3.5)

[REBEL-Llama-3-epoch_2](https://huggingface.co/Cornell-AGI/REBEL-Llama-3-epoch_2)

### AlpacaEval 2.0 Evaluations

| Model | AlpacaEval 2.0<br>LC Win Rate | AlpacaEval 2.0<br>Win Rate |
| :--------: | :--------: |  :--------: |
| REBEL-OpenChat-3.5| 17.3 | 12.8 |
| REBEL-Llama-3 | 30.1 | 32.6 |
| REBEL-Llama-3-epoch_2| 31.33 | 34.22 |

### MT-Bench Evaluations

| Model | MT-Bench<br>1st Turn | MT-Bench<br>2nd Turn | MT-Bench<br>Average |
| :--------: | :--------: |  :--------: |  :--------: |
| REBEL-OpenChat-3.5 | 8.54 | 7.58 | 8.06 |
| REBEL-Llama-3 | 8.63 | 7.69 | 8.16 |

### Open LLM Leaderboard Evaluations

| Model | MMLU<br>(5-shot) | GSM8K<br>(5-shot) | Arc<br>(25-shot) | Winogrande<br>(5-shot) | TruthfulQA<br>(0-shot) | HellaSway<br>(10-shot) | Average
| :--------: | :--------: |  :--------: |  :--------: | :--------: | :--------: |  :--------: |  :--------: |
| REBEL-OpenChat-3.5 | 63.7 | 68.8 | 64.3 | 80.4 | 48.2 | 85.0 | 68.4 |
| REBEL-Llama-3 | 65.8 | 75.6 | 61.7 | 75.8 | 51.7 | 78.8 | 68.2 |

## Citation
Please cite our paper if you use this model in your own work:
```
@misc{gao2024rebel,
      title={REBEL: Reinforcement Learning via Regressing Relative Rewards}, 
      author={Zhaolin Gao and Jonathan D. Chang and Wenhao Zhan and Owen Oertell and Gokul Swamy and Kianté Brantley and Thorsten Joachims and J. Andrew Bagnell and Jason D. Lee and Wen Sun},
      year={2024},
      eprint={2404.16767},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
```