ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling Paper • 2402.06118 • Published Feb 9 • 13
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27 • 44
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models Paper • 2403.16999 • Published Mar 25 • 4
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs Paper • 2404.16375 • Published Apr 25 • 16