Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1005

Gemma-2-9B-it scores

#843

by saishf - opened Jul 15

Discussion

saishf

Jul 15

I think something is wrong with Gemma-2-9B-it's MMLU-Pro score?
You can see in TIGER-Lab/MMLU-Pro that Gemma-2-9B-it slightly beats Phi3-Medium

But that is not the case on this leaderboard

alozowski

Open LLM Leaderboard org Jul 15

Hi @saishf ,

Thank you for your message! Yes, we are now checking why both gemma-2-9b-it and gemma-2-9b have low scores. I'll be back with my answer as soon as possible

mirek190

Jul 16

here is something wrong with math as well - 0 ? Not possible

phil111

Jul 23

This comment has been hidden

Yuma42

Jul 28

Bump (Leaving a comment because I want a notification once more information is available).

clefourrier

Open LLM Leaderboard org Jul 30

Hi all!
Happy to say that we (most likely) found the problem! (Seems to work on the base model for the subsets I tested)

At this line of our harness fork, we needed to add a patch (added to the harness on main 2 weeks ago) so that gemma2 models also start their evaluation with an added bos token systematically.

I just need to restart the evals and we should be getting updated results very soon.

Yuma42

Jul 30

That line is Gemma specific, would this mean that non Gemma models aren't Influenced by this problem? That's good news, can't wait to see the actual results.

clefourrier

Open LLM Leaderboard org Jul 30

Yep, Gemma models are a bit fickle if you don't launch them exactly like expected - it might also be affecting the recurrent gemma models, which we are looking at atm

clefourrier

Open LLM Leaderboard org Aug 1

Hi! New results should be there! Thanks for your patience and the report! :)

clefourrier changed discussion status to closed Aug 1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment