Spaces:
Running
on
CPU Upgrade
What benchmarks are used for the evaluation?
The previous SQUAD benchmark has been removed from the last version...why?
Hi,
mmlu, hellas and arc are the main benchmark because they are what it is used by the guys at Mistral to evaluate their models in the Italian language.
there are others eval in the "eval aggiuntive" tab
the squad benchmark has been moved to the "classifica rag" tab
We have planned to add more and better evals for the future!
Hi,
mmlu, hellas and arc are the main benchmark because they are what it is used by the guys at Mistral to evaluate their models in the Italian language.
there are others eval in the "eval aggiuntive" tab
the squad benchmark has been moved to the "classifica rag" tabWe have planned to add more and better evals for the future!
Thanks!
@FinancialSupport
Why not mention this fact in the documentation? Thus, future readers like me know SQUAD
dataset is used for classification evaluation tasks.
I kinda wanted to not "publicize it" too much since I know some people that train on squad and wanted to avoid contamination!