InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published about 24 hours ago • 58
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models Paper • 2412.07674 • Published 3 days ago • 19
Imagine360: Immersive 360 Video Generation from Perspective Anchor Paper • 2412.03552 • Published 9 days ago • 26
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published Oct 21 • 65
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model Paper • 2401.16420 • Published Jan 29 • 55
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Paper • 2312.03818 • Published Dec 6, 2023 • 32
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image Paper • 2312.04543 • Published Dec 7, 2023 • 21