Tool-Augmented Reward Models
Collection
[ICLR'24 Spotlight] Tool-Augmented Reward Modeling
β’
3 items
β’
Updated
Offical checkpoint for Tool-Augmented Reward Modeling (ICLR 2024 spotlight).
Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines. It was introduced in the ICLR 2024 paper and first released in this repository. Themis-7b is trained with TARA, achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.
baidu/Themis-7b
. Stay tuned!π₯@inproceedings{tarm-2024-ernie,
author = {Lei Li and
Yekun Chai and
Shuohuan Wang and
Yu Sun and
Hao Tian and
Ningyu Zhang and
Hua Wu},
title = {Tool-Augmented Reward Modeling},
booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
year = {2024},
url = {https://openreview.net/forum?id=d94x0gWTUX},
}