Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1023

How did falcon become #1? And isn't it a bit overrated?

#72

by Fredithefish - opened Jun 17, 2023

Discussion

Fredithefish

Jun 17, 2023

The leaderboard doesn't allow trust remote code did it get published manually?
In my tests falcon performs often not as good as it should be.
In logic tasks often the 7B RedPajama-INCITE-Chat model gave much better results.

Example is in the screenshots

My tests are based on the openchatkit hf space and the falcon-chat HF space

clefourrier

Open LLM Leaderboard org Jul 3, 2023

Yep, it got published manually!
Did you see our blog post on model scoring and MMLU? They could be interesting to you

clefourrier changed discussion status to closed Jul 13, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment