LLaVA-Onevision Collection LLaVa_Onevision models for single-image, multi-image, and video scenarios • 9 items • Updated Sep 18 • 12
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 27
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks Paper • 2407.03791 • Published Jul 4 • 1
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 124
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 44