Spaces:
Runtime error
Runtime error
Bayesian elo scores
#1
by
passaglia
- opened
Hi H4 team,
Thanks for the great leaderboard. I'd like to suggest using a Bayesian approach to estimate the strengths of the models rather than the Elo update formula. This is really easy, and I have a notebook here implementing it: https://github.com/yuzu-ai/japanese-llm-ranking/blob/main/jrank/bradley-terry.ipynb . It lets you get optimal estimates of the model strengths + bayesian confidence regions. I'm happy to help with implementation.
Cheers,
Sam
This comment has been hidden