Spaces:
Running
on
CPU Upgrade
Running MMLU-Pro with Eleuther LM-Eval
I was wanting to take a look at the MMLU-Pro implementation in LM-Eval and run some benchmarks to reproduce the leaderboard results. But it seems the current version doesn't have MMLU-Pro implemented as a task. There's a branch that looks like it's in development but it only contains a readme.
Is there a fork of lm-eval somewhere about that you use for the leaderboard? Ideally I'd like to get the exact code, dataset, fewshot exemplars etc to make a 1:1 comparison to the leaderboard results.
Thanks!
Hi!
Yes, there is a fork -
@SaylorTwift
has been dividing our code into lots of individual PRs to submit them to the harness, and we'll merge them all on our fork in the meantime. You'll find the first PR in the harness here, and we'll update the link to our fork once everything is merged in it.
Ah, perfect thank you!