RewardBench results?

by Avelina - opened Jul 12

Jul 12

I think it would be useful for the model to be evaluated on RewardBench and the results published.

This may be useful for researchers/developers wondering how much of a gap there is between this RM and the 70B Llama 3 RM which would help us to evaluate the price/quality tradeoff of using either model.

zhilinw

NVIDIA org Jul 15

Hi Thank you for your interest in this model! Compared to https://huggingface.co/nvidia/Llama3-70B-SteerLM-RM , this model has approximately 15% lower overall RewardBench score - partly due to the older and smaller Llama2 base model and partly due to the reward modeling datasets used. We highly recommend using the 70B Llama 3 RM over this model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment