Ray2333
/

reward-model-Mistral-7B-instruct-Unified-Feedback

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Ray2333 commited on Mar 23

Commit

3651a22

•

1 Parent(s): fd65808

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -24,11 +24,14 @@ The Unified-Feedback dataset contains diverse preference data from prior open-so
 * openbmb/UltraFeedback
 * argilla/ultrafeedback-binarized-preferences-cleaned
 * berkeley-nest/Nectar.
 ## Evaluation
-We evaluate this reward model on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench), which demonstrates that this model is the **current best 7B reward model** and outperforms prior SOTA reward models such as openbmb/UltraRM-13b and berkeley-nest/Starling-RM-7B-alpha.
 |       Model               | Average       |  Chat     |     Chat Hard      |     Safety      |     Reasoning     |       Prior Sets  |
 |:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|:---------------------:|

 * openbmb/UltraFeedback
 * argilla/ultrafeedback-binarized-preferences-cleaned
 * berkeley-nest/Nectar.
+## Training Code and Blog
+We merge the training script at https://github.com/WeiXiongUST/RLHF-Reward-Modeling, which is based on the [trl](https://github.com/huggingface/trl) package. In addition, this [blog](https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0?pvs=4) introduces some basic knowledge and shares experimental experience.
 ## Evaluation
+We evaluate this reward model on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench), which demonstrates that this model is close to **current best 7B reward model** and outperforms prior SOTA reward models such as openbmb/UltraRM-13b and berkeley-nest/Starling-RM-7B-alpha.
 |       Model               | Average       |  Chat     |     Chat Hard      |     Safety      |     Reasoning     |       Prior Sets  |
 |:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|:---------------------:|