AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated 1 day ago • 44
Insight-V Collection Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models • 5 items • Updated 2 days ago • 5
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published 2 days ago • 30 • 3
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents Paper • 2411.06559 • Published 13 days ago • 10 • 2
WhisperNER Collection Collection of WhisperNER models for joint open type NER and ASR • 3 items • Updated 23 days ago • 4