[2023.11.13] We are delighted to announce the release of OpenCompass v0.1.8. This version enables local loading of evaluation benchmarks, thereby eliminating the need for an internet connection. Please note that with this update, you must re-download all evaluation datasets to ensure accurate and up-to-date results.
[2023.11.06] We have supported several API-based models, include ChatGLM Pro@Zhipu, ABAB-Chat@MiniMax and Xunfei. Welcome to Models section for more details.
[2023.10.24] We release a new benchmark for evaluating LLMs’ capabilities of having multi-turn dialogues. Welcome to BotChat for more details.
[2023.09.26] We update the leaderboard with Qwen, one of the best-performing open-source models currently available, welcome to our homepage for more details.
[2023.09.20] We update the leaderboard with InternLM-20B, welcome to our homepage for more details.
[2023.09.19] We update the leaderboard with WeMix-LLaMA2-70B/Phi-1.5-1.3B, welcome to our homepage for more details.
[2023.09.08] We update the leaderboard with Baichuan-2/Tigerbot-2/Vicuna-v1.5, welcome to our homepage for more details.
[2023.09.06]Baichuan2 team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
[2023.09.02] We have supported the evaluation of Qwen-VL in OpenCompass.
[2023.08.25]TigerBot team adpots OpenCompass to evaluate their models systematically. We deeply appreciate the community's dedication to transparency and reproducibility in LLM evaluation.
[2023.08.21]Lagent has been released, which is a lightweight framework for building LLM-based agents. We are working with Lagent team to support the evaluation of general tool-use capability, stay tuned!
[2023.08.18] We have supported evaluation for multi-modality learning, include MMBench, SEED-Bench, COCO-Caption, Flickr-30K, OCR-VQA, ScienceQA and so on. Leaderboard is on the road. Feel free to try multi-modality evaluation with OpenCompass !
[2023.08.18]Dataset card is now online. Welcome new evaluation benchmark OpenCompass !
[2023.08.11]Model comparison is now online. We hope this feature offers deeper insights!