BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published Apr 18 • 23
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension Paper • 2404.16790 • Published Apr 25 • 7
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published May 13 • 15
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding Paper • 2406.09411 • Published 24 days ago • 17
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark Paper • 2406.05967 • Published 28 days ago • 5
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published 25 days ago • 24
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation Paper • 2406.09961 • Published 24 days ago • 54
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published 25 days ago • 28
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published 23 days ago • 25
VideoGUI: A Benchmark for GUI Automation from Instructional Videos Paper • 2406.10227 • Published 23 days ago • 8
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published 20 days ago • 61
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning Paper • 2406.12742 • Published 19 days ago • 14
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published 21 days ago • 34
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published 17 days ago • 28
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published 14 days ago • 23
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Paper • 2406.18521 • Published 11 days ago • 25
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published 7 days ago • 68
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation Paper • 2407.00468 • Published 8 days ago • 34
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Paper • 2407.01791 • Published 6 days ago • 4