The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published 6 days ago • 26
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation Paper • 2411.08033 • Published 9 days ago • 21
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published 7 days ago • 65
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published 7 days ago • 50
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published 10 days ago • 21
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Paper • 2411.04928 • Published 14 days ago • 47
GameGen-X: Interactive Open-world Game Video Generation Paper • 2411.00769 • Published 20 days ago • 2
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning Paper • 2410.21845 • Published 23 days ago • 11
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published 29 days ago • 49
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published 28 days ago • 23
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching Paper • 2407.03648 • Published Jul 4 • 16
Unbounded: A Generative Infinite Game of Character Life Simulation Paper • 2410.18975 • Published 28 days ago • 34
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding Paper • 2403.11481 • Published Mar 18 • 12
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published Sep 3 • 35
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper • 2410.10792 • Published Oct 14 • 26
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14 • 52