Reproducibility issue

#2
by mlabonne - opened

Hi @zyh3826 , I'm playing with mergekit and wanted to reproduce your results with this model. Unfortunately, I only got an average score of 48.54 (vs. your 73.3) on the Open LLM Leaderboard.

Did you do extra steps or is there something I might have missed? Thank you.

Im trying the same layer combination with another model and getting complete gibberish. Amazing this even works at all

It's weird because their model does perform very well on Nous benchmark suite: https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard

Sign up or log in to comment