Spaces:

TIGER-Lab
/

MMLU-Pro

Running on CPU Upgrade

App Files Files Community

Ministral 3B results seem off

by patrickvonplaten - opened 18 days ago

Discussion

patrickvonplaten

18 days ago

Hey @wenhu ,

Engineer from Mistral here!

Thanks for the nice leaderboard. Just a quick question - how did you retrieve the results for the 3B ministral model? They seem to be significantly off from what we evaluated internally - 10% is essentially random chance. Could you share the script that you used to retrieve the benchmark results? We'd love to make sure things are correctly reported.

Thanks a lot!

pandora-s

15 days ago

Just in case, was the model bench-marked ministral/Ministral-3b-instruct ? If yes, may be interesting to put the model's authors name in the leaderboard, unfortunately we have very similar model names.

ubowang

TIGER-Lab org 14 days ago

Thank you for bringing this to our attention. There seems to have been some confusion due to similar model names. We have removed the potentially confusing results to avoid any misrepresentation.

ubowang changed discussion status to closed 14 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment