LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 9 days ago • 95
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation Paper • 2310.16656 • Published Oct 25, 2023 • 40
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image Paper • 2312.04543 • Published Dec 7, 2023 • 21
Lenna: Language Enhanced Reasoning Detection Assistant Paper • 2312.02433 • Published Dec 5, 2023 • 2