Offical checkpoint for Tool-Augmented Reward Modeling (ICLR 2024 spotlight).

Model Description

Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines. It was introduced in the ICLR 2024 paper and first released in this repository. Themis-7b is trained with TARA, achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.

🔥 News

9 February, 2024: 🎉 We release the official codebase and model weights of baidu/Themis-7b. Stay tuned!🔥
16 January, 2024: 🎉 Our work has been accepted to ICLR 2024 spotlight! ✨

Citation

@inproceedings{tarm-2024-ernie,
  author = {Lei Li and
            Yekun Chai and
            Shuohuan Wang and
            Yu Sun and
            Hao Tian and
            Ningyu Zhang and
            Hua Wu},
  title = {Tool-Augmented Reward Modeling},
  booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
  year = {2024},
  url = {https://openreview.net/forum?id=d94x0gWTUX},
}

baidu
/

Themis-7b

Model Description

🔥 News

Citation

Dataset used to train baidu/Themis-7b

Collection including baidu/Themis-7b

Tool-Augmented Reward Models