In TruthfulQA, v0.2 has the best score, but you claim v0.5 is best on TruthfulQA
#7
by
rocke
- opened
In Eval
TruthfulQA, v0.2 has the best score, but you claim v0.5 is best on TruthfulQA
It seems a tiny mistake?
thanks for your great work!
Hi @rocke
Thanks for reporting this. I think I messed up the columns for eval, then wrote based on that. I am going to remove that part, because most of them are in the Leaderboard. (soon there will be a PR to add the official eval)
That said, this model is best in MMLU and GSM8K.