facebook
/

multiband-diffusion

Audiocraft

encodec

audio

music

Model card Files Files and versions Community

ylacombe HF staff commited on Aug 8, 2023

Commit

841e770

•

1 Parent(s): 9db60fe

Create README.md

Browse files

Files changed (1) hide show

README.md +146 -0

README.md ADDED Viewed

	@@ -0,0 +1,146 @@

+---
+license: cc-by-4.0
+tags:
+- encodec
+- audio
+- music
+- audiocraft
+---
+<a target="_blank" href="https://colab.research.google.com/drive/1JlTOjB-G0A2Hz3h8PK63vLZk4xdCI5QB?usp=sharing">
+  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
+</a>
+<br>
+# MultiBand Diffusion
+<!-- Provide a quick summary of what the model is/does. -->
+This repository contains the weights for Meta's MultiBand Diffusion models, described in this research paper: [From Discrete Tokens to High Fidelity Audio using MultiBand Diffusion][arxiv].
+MultiBand diffusion is a collection of 4 models that can decode tokens from <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> into waveform audio.
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** Meta
+- **Model type:** Diffusion Models
+- **License:** The models weights in this repository are released under the CC-BY-NC 4.0 license.
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [AudioCraft repo](https://github.com/facebookresearch/audiocraft/tree/main)
+- **Paper:** [From Discrete Tokens to High Fidelity Audio using MultiBand Diffusion](https://dl.fbaipublicfiles.com/encodec/Diffusion/paper.pdf)
+## Installation
+Please follow the AudioCraft installation instructions from the [README](../README.md).
+## Usage
+[AudioCraft library](https://github.com/facebookresearch/audiocraft/tree/main) offers a number of way to use MultiBand Diffusion:
+1. A MusicGen demo includes a toggle to try diffusion decoder. You can use the demo locally by running [`python -m demos.musicgen_app --share`](https://github.com/facebookresearch/audiocraft/tree/main/demos/musicgen_app.py), or through a [MusicGen Colab](https://colab.research.google.com/drive/1JlTOjB-G0A2Hz3h8PK63vLZk4xdCI5QB?usp=sharing).
+2. You can play with MusicGen by running the jupyter notebook at [`demos/musicgen_demo.ipynb`](https://github.com/facebookresearch/audiocraft/tree/main/demos/musicgen_demo.ipynb) locally (if you have a GPU).
+## API
+[AudioCraft library](https://github.com/facebookresearch/audiocraft/tree/main) provides a simple API and pre-trained models for MusicGen and for EnCodec at 24 khz for 3 bitrates (1.5 kbps, 3 kbps and 6 kbps).
+See after a quick example for using MultiBandDiffusion with the MusicGen API:
+```python
+import torchaudio
+from audiocraft.models import MusicGen, MultiBandDiffusion
+from audiocraft.data.audio import audio_write
+model = MusicGen.get_pretrained('facebook/musicgen-melody')
+mbd = MultiBandDiffusion.get_mbd_musicgen()
+model.set_generation_params(duration=8)  # generate 8 seconds.
+wav, tokens = model.generate_unconditional(4, return_tokens=True)    # generates 4 unconditional audio samples and keep the tokens for MBD generation
+descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
+wav_diffusion = mbd.tokens_to_wav(tokens)
+wav, tokens = model.generate(descriptions, return_tokens=True)  # generates 3 samples and keep the tokens.
+wav_diffusion = mbd.tokens_to_wav(tokens)
+melody, sr = torchaudio.load('./assets/bach.mp3')
+# Generates using the melody from the given audio and the provided descriptions, returns audio and audio tokens.
+wav, tokens = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr, return_tokens=True)
+wav_diffusion = mbd.tokens_to_wav(tokens)
+for idx, one_wav in enumerate(wav):
+    # Will save under {idx}.wav and {idx}_diffusion.wav, with loudness normalization at -14 db LUFS for comparing the methods.
+    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
+    audio_write(f'{idx}_diffusion', wav_diffusion[idx].cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
+```
+For the compression task (and to compare with [EnCodec](https://github.com/facebookresearch/encodec)):
+```python
+import torch
+from audiocraft.models import MultiBandDiffusion
+from encodec import EncodecModel
+from audiocraft.data.audio import audio_read, audio_write
+bandwidth = 3.0  # 1.5, 3.0, 6.0
+mbd = MultiBandDiffusion.get_mbd_24khz(bw=bandwidth)
+encodec = EncodecModel.get_encodec_24khz()
+somepath = ''
+wav, sr = audio_read(somepath)
+with torch.no_grad():
+    compressed_encodec = encodec(wav)
+    compressed_diffusion = mbd.regenerate(wav, sample_rate=sr)
+audio_write('sample_encodec', compressed_encodec.squeeze(0).cpu(), mbd.sample_rate, strategy="loudness", loudness_compressor=True)
+audio_write('sample_diffusion', compressed_diffusion.squeeze(0).cpu(), mbd.sample_rate, strategy="loudness", loudness_compressor=True)
+```
+## Training
+A [DiffusionSolver](https://github.com/facebookresearch/audiocraft/tree/main/audiocraft/solvers/diffusion.py) implements Meta diffusion training pipeline.
+It generates waveform audio conditioned on the embeddings extracted from a pre-trained EnCodec model
+(see [EnCodec documentation from the AudioCraft library](https://github.com/facebookresearch/audiocraft/tree/main/ENCODEC.md) for more details on how to train such model).
+Note that **the library do NOT provide any of the datasets** used for training our diffusion models.
+We provide a dummy dataset containing just a few examples for illustrative purposes.
+### Example configurations and grids
+One can train diffusion models as described in the paper by using this [dora grid](https://github.com/facebookresearch/audiocraft/tree/main/audiocraft/grids/diffusion/4_bands_base_32khz.py).
+```shell
+# 4 bands MBD trainning
+dora grid diffusion.4_bands_base_32khz
+```
+### Learn more
+Learn more about AudioCraft training pipelines in the [dedicated section](https://github.com/facebookresearch/audiocraft/tree/main/TRAINING.md).
+## Citation
+```
+@article{sanroman2023fromdi,
+  title={From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion},
+  author={San Roman, Robin and Adi, Yossi and Deleforge, Antoine and Serizel, Romain and Synnaeve, Gabriel and Défossez, Alexandre},
+  journal={arXiv preprint arXiv:},
+  year={2023}
+}
+```
+[arxiv]: https://dl.fbaipublicfiles.com/encodec/Diffusion/paper.pdf
+[mbd_samples]: https://ai.honu.io/papers/mbd/