About MMLU evaluation

#12

by ldwang - opened 25 days ago

Discussion

ldwang

25 days ago

•

edited 25 days ago

Thank you for sharing.

Some models, like Qwen1.5B Phi1.5, typically use a 5-shot setting to measure MMLU.
And cosmo-1b also used the same setting https://huggingface.co/blog/cosmopedia#training-stack.

Can you explain why here MMLU evaluations are changed to a zero-shot plus option content approach?

Thank you.

loubnabnl

Hugging Face TB Research org 24 days ago

•

edited 24 days ago

Hi, we use the same evaluation setup now for our internal projects (same as FineWeb and FineWeb-Edu ablations) where we do zero-shot for all the benchmarks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment