-
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 11 -
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Paper • 2406.18521 • Published • 25 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 33 -
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 92
Collections
Discover the best community collections!
Collections including paper arxiv:2401.14405
-
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Paper • 2312.06134 • Published • 2 -
Efficient Monotonic Multihead Attention
Paper • 2312.04515 • Published • 6 -
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 37 -
Exploring Format Consistency for Instruction Tuning
Paper • 2307.15504 • Published • 7
-
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 32 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 11 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 15 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 26
-
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
Paper • 2401.13919 • Published • 23 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 11 -
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper • 2403.03163 • Published • 93 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 64
-
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Paper • 2312.08344 • Published • 9 -
Diffusion Priors for Dynamic View Synthesis from Monocular Videos
Paper • 2401.05583 • Published • 7 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 11