M^3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning Paper • 2306.04387 • Published Jun 7, 2023 • 8
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis Paper • 2405.21075 • Published May 31 • 19
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding Paper • 2312.02051 • Published Dec 4, 2023 • 1