How did falcon become #1? And isn't it a bit overrated?

#72
by Fredithefish - opened

The leaderboard doesn't allow trust remote code did it get published manually?
In my tests falcon performs often not as good as it should be.
In logic tasks often the 7B RedPajama-INCITE-Chat model gave much better results.

Example is in the screenshots
image-1.png

image.png

My tests are based on the openchatkit hf space and the falcon-chat HF space

Open LLM Leaderboard org

Yep, it got published manually!
Did you see our blog post on model scoring and MMLU? They could be interesting to you

clefourrier changed discussion status to closed

Sign up or log in to comment