Ministral 3B results seem off

#3
by patrickvonplaten - opened

Hey @wenhu ,

Engineer from Mistral here!

Thanks for the nice leaderboard. Just a quick question - how did you retrieve the results for the 3B ministral model? They seem to be significantly off from what we evaluated internally - 10% is essentially random chance. Could you share the script that you used to retrieve the benchmark results? We'd love to make sure things are correctly reported.

Thanks a lot!

Just in case, was the model bench-marked ministral/Ministral-3b-instruct ? If yes, may be interesting to put the model's authors name in the leaderboard, unfortunately we have very similar model names.

TIGER-Lab org

Thank you for bringing this to our attention. There seems to have been some confusion due to similar model names. We have removed the potentially confusing results to avoid any misrepresentation.

ubowang changed discussion status to closed

Sign up or log in to comment