Efficient Exact Optimization
Collection
SFT & Reward Models used in the experiments of the ICML 2024 paper "Towards Efficient Exact Optimization of Language Model Alignment"
•
2 items
•
Updated
model: exo-imdb-reward-model
dataset: imdb (original stanford version)
Reward model used in the imdb experiment of the ICML'24 paper Towards Efficient Exact Optimization of Language Model Alignment.
For details of the dataset, training and inference of this model, please refer to https://github.com/haozheji/exact-optimization/blob/main/exp/imdb_exp/README.md