RewardBench results?
#2
by
Avelina
- opened
I think it would be useful for the model to be evaluated on RewardBench and the results published.
This may be useful for researchers/developers wondering how much of a gap there is between this RM and the 70B Llama 3 RM which would help us to evaluate the price/quality tradeoff of using either model.
Hi Thank you for your interest in this model! Compared to https://huggingface.co/nvidia/Llama3-70B-SteerLM-RM , this model has approximately 15% lower overall RewardBench score - partly due to the older and smaller Llama2 base model and partly due to the reward modeling datasets used. We highly recommend using the 70B Llama 3 RM over this model.