Submitted by akhaliq 58 Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs · 8 authors 3
Submitted by akhaliq 53 MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators · 9 authors 2
Submitted by akhaliq 24 ByteEdit: Boost, Comply and Accelerate Generative Image Editing · 14 authors 1
Submitted by akhaliq 23 SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing · 10 authors
Submitted by akhaliq 20 BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion · 5 authors
Submitted by akhaliq 18 MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding · 8 authors
Submitted by akhaliq 14 PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations · 11 authors
Submitted by akhaliq 12 MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation · 6 authors 1
Submitted by akhaliq 11 Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models · 5 authors