Cannot verify benchmark results
#4
by
Lexski
- opened
On the model card it says the model gets
AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98
and
Llama-3.1-Nemotron-70B-Instruct performs best on Arena Hard, AlpacaEval 2 LC (verified tab) and MT Bench (GPT-4-Turbo)
I tried following the links, but I cannot verify the results. The AlpacaEval 2.0 link indeed shows the leaderboard, but this Nemotron model does not appear on the leaderboard. The MT-Bench link takes me to a GitHub PR which doesn't mention GPT-4-Turbo or the Nemotron model.
Български разбираш ли
Those benchmarks were run internally so it's normal that you can't find those numbers online:
- The AlpacaEval 2.0 link is for people to compare to the official leaderboard
- The MT-Bench link is for people who may want to run this benchmark themselves, since it requires the changes from this PR