ehzoah
/

exo-imdb-reward-model

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

exo-imdb-reward-model / README.md

ehzoah's picture

Update README.md

3730cd2 verified 5 months ago

|

history blame contribute delete

723 Bytes

	---
	license: apache-2.0
	---
	# Towards Efficient Exact Optimization of Language Model Alignment

	- model: [exo-imdb-reward-model](https://huggingface.co/ehzoah/exo-imdb-reward-model)

	- Finetuned from model: [gpt2-large](https://huggingface.co/openai-community/gpt2-large)

	- dataset: [imdb](https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz) (original stanford version)

	Reward model used in the imdb experiment of the ICML'24 paper [Towards Efficient Exact Optimization of Language Model Alignment](https://arxiv.org/pdf/2402.00856).

	For details of the dataset, training and inference of this model, please refer to https://github.com/haozheji/exact-optimization/blob/main/exp/imdb_exp/README.md