HuggingFaceTB/cosmo-1b · What is the command used to evaluate on MMLU?

Feb 24

Thanks for open-sourcing the model and dataset and congrat on the release!

May I ask which command is used to evaluate on MMLU ?

I tried

accelerate launch --num_processes 8 -m lm_eval  --model_args pretrained=HuggingFaceTB/cosmo-1b,dtype=bfloat16,use_flash_attention_2=True \
        --tasks mmlu --num_fewshot 5\
        --batch_size 16

and get the following results:

Groups	Version	Filter	n-shot	Metric	Value		Stderr
mmlu	N/A	none	0	acc	0.2608	±	0.0397
- humanities	N/A	none	5	acc	0.2544	±	0.0289
- other	N/A	none	5	acc	0.2671	±	0.0414
- social_sciences	N/A	none	5	acc	0.2548	±	0.0401
- stem	N/A	none	5	acc	0.2699	±	0.0491

PY007

Feb 24

Scores on OpenLLM leaderboard:

loubnabnl

Hugging Face TB Research org Mar 4

•

edited Mar 4

Thanks for pointing it out, the model was evaluated before we converted it form our training framework to transformers maybe something went wrong, we'll run some tests.