|
--- |
|
license: apache-2.0 |
|
--- |
|
# Towards Efficient Exact Optimization of Language Model Alignment |
|
|
|
- **model**: [exo-imdb-reward-model](https://huggingface.co/ehzoah/exo-imdb-reward-model) |
|
|
|
- Finetuned from model: [gpt2-large](https://huggingface.co/openai-community/gpt2-large) |
|
|
|
- **dataset**: [imdb](https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz) (original stanford version) |
|
|
|
Reward model used in the imdb experiment of the ICML'24 paper [*Towards Efficient Exact Optimization of Language Model Alignment*](https://arxiv.org/pdf/2402.00856). |
|
|
|
For details of the dataset, training and inference of this model, please refer to https://github.com/haozheji/exact-optimization/blob/main/exp/imdb_exp/README.md |