ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth Paper • 2302.12288 • Published Feb 23, 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Paper • 2201.12086 • Published Jan 28, 2022 • 3
Multimodal Foundation Models: From Specialists to General-Purpose Assistants Paper • 2309.10020 • Published Sep 18, 2023 • 40
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) Paper • 2309.17421 • Published Sep 29, 2023 • 4
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V Paper • 2310.11441 • Published Oct 17, 2023 • 26