In TruthfulQA, v0.2 has the best score, but you claim v0.5 is best on TruthfulQA

#7
by rocke - opened

In Eval
TruthfulQA, v0.2 has the best score, but you claim v0.5 is best on TruthfulQA
It seems a tiny mistake?

thanks for your great work!

Hi @rocke

Thanks for reporting this. I think I messed up the columns for eval, then wrote based on that. I am going to remove that part, because most of them are in the Leaderboard. (soon there will be a PR to add the official eval)

That said, this model is best in MMLU and GSM8K.

Sign up or log in to comment