Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published 3 days ago • 16
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published 4 days ago • 33
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published 3 days ago • 35
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published 9 days ago • 53
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions Paper • 2411.14405 • Published 3 days ago • 35
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents Paper • 2411.06559 • Published 14 days ago • 10
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Paper • 2411.13476 • Published 4 days ago • 12
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published 4 days ago • 15
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper • 2411.13503 • Published 4 days ago • 24
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published 8 days ago • 44
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs Paper • 2411.02571 • Published 20 days ago • 1
SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning Paper • 2411.10161 • Published 9 days ago • 6
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements Paper • 2411.12044 • Published 6 days ago • 13
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published 6 days ago • 44
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization Paper • 2411.11909 • Published 8 days ago • 20
Drowning in Documents: Consequences of Scaling Reranker Inference Paper • 2411.11767 • Published 6 days ago • 16
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published 9 days ago • 39
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published 9 days ago • 27