What is the command used to evaluate on MMLU?
#3
by
PY007
- opened
Thanks for open-sourcing the model and dataset and congrat on the release!
May I ask which command is used to evaluate on MMLU ?
I tried
accelerate launch --num_processes 8 -m lm_eval --model_args pretrained=HuggingFaceTB/cosmo-1b,dtype=bfloat16,use_flash_attention_2=True \
--tasks mmlu --num_fewshot 5\
--batch_size 16
and get the following results:
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
mmlu | N/A | none | 0 | acc | 0.2608 | ± | 0.0397 |
- humanities | N/A | none | 5 | acc | 0.2544 | ± | 0.0289 |
- other | N/A | none | 5 | acc | 0.2671 | ± | 0.0414 |
- social_sciences | N/A | none | 5 | acc | 0.2548 | ± | 0.0401 |
- stem | N/A | none | 5 | acc | 0.2699 | ± | 0.0491 |
Thanks for pointing it out, the model was evaluated before we converted it form our training framework to transformers maybe something went wrong, we'll run some tests.