-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper ā¢ 2402.04252 ā¢ Published ā¢ 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper ā¢ 2402.03749 ā¢ Published ā¢ 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper ā¢ 2402.04615 ā¢ Published ā¢ 38 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper ā¢ 2402.05008 ā¢ Published ā¢ 19
Collections
Discover the best community collections!
Collections including paper arxiv:2403.12895
-
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Paper ā¢ 2403.12895 ā¢ Published ā¢ 30 -
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Paper ā¢ 2408.01800 ā¢ Published ā¢ 77 -
Phantom of Latent for Large Language and Vision Models
Paper ā¢ 2409.14713 ā¢ Published ā¢ 27