OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text Paper • 2406.08418 • Published Jun 12 • 28
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 124
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published Jun 18 • 14
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published Jun 17 • 34
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper • 2406.11271 • Published Jun 17 • 18