Ray2333
/

reward-model-Mistral-7B-instruct-Unified-Feedback

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Ray2333 commited on Sep 1

Commit

0f56399

•

1 Parent(s): 1b50cd7

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -73,7 +73,16 @@ with torch.no_grad():
 ```
-## To be added ...

 ```
+## Citation
+This reward model is used as a gold reward model for the following research https://arxiv.org/abs/2406.10216. If you find this model helpful for your research, please cite
+```
+@article{yang2024regularizing,
+  title={Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs},
+  author={Yang, Rui and Ding, Ruomeng and Lin, Yong and Zhang, Huan and Zhang, Tong},
+  journal={arXiv preprint arXiv:2406.10216},
+  year={2024}
+}
+```