Cannot verify benchmark results

by Lexski - opened Oct 21

Discussion

Lexski

Oct 21

On the model card it says the model gets

AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98

and

Llama-3.1-Nemotron-70B-Instruct performs best on Arena Hard, AlpacaEval 2 LC (verified tab) and MT Bench (GPT-4-Turbo)

I tried following the links, but I cannot verify the results. The AlpacaEval 2.0 link indeed shows the leaderboard, but this Nemotron model does not appear on the leaderboard. The MT-Bench link takes me to a GitHub PR which doesn't mention GPT-4-Turbo or the Nemotron model.

didonsi

Oct 21

Български разбираш ли

odelalleau

NVIDIA org 20 days ago

Those benchmarks were run internally so it's normal that you can't find those numbers online:

The AlpacaEval 2.0 link is for people to compare to the official leaderboard
The MT-Bench link is for people who may want to run this benchmark themselves, since it requires the changes from this PR

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment