EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27 • 184
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published about 1 month ago • 27
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning Paper • 2406.03344 • Published 30 days ago • 15
VideoTetris: Towards Compositional Text-to-Video Generation Paper • 2406.04277 • Published 29 days ago • 21
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Paper • 2406.18009 • Published 9 days ago • 17
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published 2 days ago • 10