-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 178 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 14 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 45 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 40
Collections
Discover the best community collections!
Collections including paper arxiv:2401.00908
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 75 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 39 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 16
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 13 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 1 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 133
-
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Paper • 2310.16527 • Published • 2 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 178 -
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 10