Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework Paper • 2403.13248 • Published Mar 20 • 77
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper • 2311.05437 • Published Nov 9, 2023 • 47
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models Paper • 2409.20551 • Published Sep 30 • 13
Visual Question Decomposition on Multimodal Large Language Models Paper • 2409.19339 • Published Sep 28 • 7
FreeInit: Bridging Initialization Gap in Video Diffusion Models Paper • 2312.07537 • Published Dec 12, 2023 • 26