-
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
Paper • 2404.09990 • Published • 12 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 11 -
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Paper • 2404.09204 • Published • 10 -
Taming Latent Diffusion Model for Neural Radiance Field Inpainting
Paper • 2404.09995 • Published • 6
Collections
Discover the best community collections!
Collections including paper arxiv:2312.09911
-
A Novel 1D State Space for Efficient Music Rhythmic Analysis
Paper • 2111.00704 • Published -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 53 -
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Paper • 2402.13763 • Published • 10 -
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Paper • 2402.16153 • Published • 56
-
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Paper • 2211.06687 • Published • 3 -
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Paper • 2401.17690 • Published • 5 -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 53 -
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper • 2312.15821 • Published • 12
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 28 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 27 -
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Paper • 2401.00604 • Published • 4 -
LARP: Language-Agent Role Play for Open-World Games
Paper • 2312.17653 • Published • 30
-
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 45 -
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Paper • 2312.13314 • Published • 7 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 258 -
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 53
-
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Paper • 2312.09911 • Published • 53 -
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Paper • 2312.11461 • Published • 18 -
PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar
Paper • 2312.14239 • Published • 10 -
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
Paper • 2312.16256 • Published • 15