Running
on
CPU Upgrade
162
🥇
MMLU Pro
More advanced and challenging multi-task evaluation
More advanced and challenging multi-task evaluation
Compact LLM Battle Arena: Frugal AI Face-Off!
VLMEvalKit Eval Results in video understanding benchmark
Track, rank and evaluate open LLMs and chatbots