ehzoah
/

exo-imdb-reward-model

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

exo-imdb-reward-model / README.md

ehzoah's picture

Update README.md

3730cd2 verified 5 months ago

|

history blame contribute delete

723 Bytes

metadata

license: apache-2.0

Towards Efficient Exact Optimization of Language Model Alignment

model: exo-imdb-reward-model
- Finetuned from model: gpt2-large
dataset: imdb (original stanford version)

Reward model used in the imdb experiment of the ICML'24 paper Towards Efficient Exact Optimization of Language Model Alignment.

For details of the dataset, training and inference of this model, please refer to https://github.com/haozheji/exact-optimization/blob/main/exp/imdb_exp/README.md