Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents Paper • 2411.06559 • Published 11 days ago • 9 • 1
AnimateAnything: Consistent and Controllable Animation for Video Generation Paper • 2411.10836 • Published 5 days ago • 17 • 2
Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts Paper • 2411.10669 • Published 6 days ago • 9 • 2
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers Paper • 2411.10510 • Published 6 days ago • 8 • 2
FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on Paper • 2411.10499 • Published 6 days ago • 9 • 2
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement Paper • 2411.06558 • Published 11 days ago • 29 • 6
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published 6 days ago • 26 • 2
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 6 days ago • 87 • 6
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 6 days ago • 87 • 6
Cut Your Losses in Large-Vocabulary Language Models Paper • 2411.09009 • Published 8 days ago • 37 • 4
Cut Your Losses in Large-Vocabulary Language Models Paper • 2411.09009 • Published 8 days ago • 37 • 4