DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding Paper • 2411.14347 • Published 4 days ago • 8
Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published 21 days ago • 24
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published 21 days ago • 32
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published 26 days ago • 46
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning Paper • 2410.22304 • Published 27 days ago • 15
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs Paper • 2311.04901 • Published Nov 8, 2023 • 7