SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Paper • 2407.09413 • Published Jul 12 • 9
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published Sep 9 • 45
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19 • 47
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions Paper • 2410.10816 • Published 23 days ago • 19
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published 20 days ago • 29