Audio Models - a PeppePasti Collection

PeppePasti 's Collections

LLMs

Multimodal LLMs

RAG

Agents

Reinforcement learning (RL)

Liquid Neural Networks

Diffusion Models

Text Embedding & Rankers

Computer Vision

Multi-lingual Training Language Models

NLP (no LLM related)

Interesting Stuffs

Audio Models

updated Sep 24

Foundation Models for Music: A Survey

Paper • 2408.14340 • Published Aug 26 • 39
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Paper • 2408.16532 • Published Aug 29 • 46
FLUX that Plays Music

Paper • 2409.00587 • Published Sep 1 • 31
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation

Paper • 2409.02245 • Published Sep 3 • 9
SongCreator: Lyrics-based Universal Song Generation

Paper • 2409.06029 • Published Sep 9 • 19
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

Paper • 2409.06135 • Published Sep 10 • 14
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published Sep 13 • 46
AudioBERT: Audio Knowledge Augmented Language Model

Paper • 2409.08199 • Published Sep 12 • 4
Zero-shot Cross-lingual Voice Transfer for TTS

Paper • 2409.13910 • Published Sep 20 • 7