human_eval_llm_leaderboard

Runtime error

App Files Files Community

Bayesian elo scores

by passaglia - opened Jun 29, 2023

Discussion

passaglia

Jun 29, 2023

•

edited Jun 29, 2023

Hi H4 team,

Thanks for the great leaderboard. I'd like to suggest using a Bayesian approach to estimate the strengths of the models rather than the Elo update formula. This is really easy, and I have a notebook here implementing it: https://github.com/yuzu-ai/japanese-llm-ranking/blob/main/jrank/bradley-terry.ipynb . It lets you get optimal estimates of the model strengths + bayesian confidence regions. I'm happy to help with implementation.

Cheers,
Sam

Leo1016

Jul 27, 2023

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment