sanchit-gandhi HF staff ylacombe HF staff commited on
Commit
62daf97
1 Parent(s): aaf6705

Create README.md (#5)

Browse files

- Create README.md (841e770848e554829f0f7f4c3496f48b34aaea5c)
- update license (9e16467064b01cb34acddf038e46766b39232ce6)


Co-authored-by: Yoach Lacombe <[email protected]>

Files changed (1) hide show
  1. README.md +146 -0
README.md ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - encodec
5
+ - audio
6
+ - music
7
+ - audiocraft
8
+ ---
9
+
10
+
11
+ <a target="_blank" href="https://colab.research.google.com/drive/1JlTOjB-G0A2Hz3h8PK63vLZk4xdCI5QB?usp=sharing">
12
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
13
+ </a>
14
+ <br>
15
+
16
+ # MultiBand Diffusion
17
+
18
+ <!-- Provide a quick summary of what the model is/does. -->
19
+
20
+ This repository contains the weights for Meta's MultiBand Diffusion models, described in this research paper: [From Discrete Tokens to High Fidelity Audio using MultiBand Diffusion][arxiv].
21
+
22
+ MultiBand diffusion is a collection of 4 models that can decode tokens from <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> into waveform audio.
23
+
24
+
25
+ ## Model Details
26
+
27
+ ### Model Description
28
+
29
+ <!-- Provide a longer summary of what this model is. -->
30
+
31
+
32
+
33
+ - **Developed by:** Meta
34
+ - **Model type:** Diffusion Models
35
+ - **License:** The models weights in this repository are released under the CC-BY-NC 4.0 license.
36
+
37
+ ### Model Sources [optional]
38
+
39
+ <!-- Provide the basic links for the model. -->
40
+
41
+ - **Repository:** [AudioCraft repo](https://github.com/facebookresearch/audiocraft/tree/main)
42
+ - **Paper:** [From Discrete Tokens to High Fidelity Audio using MultiBand Diffusion](https://dl.fbaipublicfiles.com/encodec/Diffusion/paper.pdf)
43
+
44
+
45
+
46
+ ## Installation
47
+
48
+ Please follow the AudioCraft installation instructions from the [README](../README.md).
49
+
50
+
51
+ ## Usage
52
+
53
+ [AudioCraft library](https://github.com/facebookresearch/audiocraft/tree/main) offers a number of way to use MultiBand Diffusion:
54
+ 1. A MusicGen demo includes a toggle to try diffusion decoder. You can use the demo locally by running [`python -m demos.musicgen_app --share`](https://github.com/facebookresearch/audiocraft/tree/main/demos/musicgen_app.py), or through a [MusicGen Colab](https://colab.research.google.com/drive/1JlTOjB-G0A2Hz3h8PK63vLZk4xdCI5QB?usp=sharing).
55
+ 2. You can play with MusicGen by running the jupyter notebook at [`demos/musicgen_demo.ipynb`](https://github.com/facebookresearch/audiocraft/tree/main/demos/musicgen_demo.ipynb) locally (if you have a GPU).
56
+
57
+ ## API
58
+
59
+ [AudioCraft library](https://github.com/facebookresearch/audiocraft/tree/main) provides a simple API and pre-trained models for MusicGen and for EnCodec at 24 khz for 3 bitrates (1.5 kbps, 3 kbps and 6 kbps).
60
+
61
+ See after a quick example for using MultiBandDiffusion with the MusicGen API:
62
+
63
+ ```python
64
+ import torchaudio
65
+ from audiocraft.models import MusicGen, MultiBandDiffusion
66
+ from audiocraft.data.audio import audio_write
67
+
68
+ model = MusicGen.get_pretrained('facebook/musicgen-melody')
69
+ mbd = MultiBandDiffusion.get_mbd_musicgen()
70
+ model.set_generation_params(duration=8) # generate 8 seconds.
71
+ wav, tokens = model.generate_unconditional(4, return_tokens=True) # generates 4 unconditional audio samples and keep the tokens for MBD generation
72
+ descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
73
+ wav_diffusion = mbd.tokens_to_wav(tokens)
74
+ wav, tokens = model.generate(descriptions, return_tokens=True) # generates 3 samples and keep the tokens.
75
+ wav_diffusion = mbd.tokens_to_wav(tokens)
76
+ melody, sr = torchaudio.load('./assets/bach.mp3')
77
+ # Generates using the melody from the given audio and the provided descriptions, returns audio and audio tokens.
78
+ wav, tokens = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr, return_tokens=True)
79
+ wav_diffusion = mbd.tokens_to_wav(tokens)
80
+
81
+ for idx, one_wav in enumerate(wav):
82
+ # Will save under {idx}.wav and {idx}_diffusion.wav, with loudness normalization at -14 db LUFS for comparing the methods.
83
+ audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
84
+ audio_write(f'{idx}_diffusion', wav_diffusion[idx].cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
85
+ ```
86
+
87
+ For the compression task (and to compare with [EnCodec](https://github.com/facebookresearch/encodec)):
88
+
89
+ ```python
90
+ import torch
91
+ from audiocraft.models import MultiBandDiffusion
92
+ from encodec import EncodecModel
93
+ from audiocraft.data.audio import audio_read, audio_write
94
+
95
+ bandwidth = 3.0 # 1.5, 3.0, 6.0
96
+ mbd = MultiBandDiffusion.get_mbd_24khz(bw=bandwidth)
97
+ encodec = EncodecModel.get_encodec_24khz()
98
+
99
+ somepath = ''
100
+ wav, sr = audio_read(somepath)
101
+ with torch.no_grad():
102
+ compressed_encodec = encodec(wav)
103
+ compressed_diffusion = mbd.regenerate(wav, sample_rate=sr)
104
+
105
+ audio_write('sample_encodec', compressed_encodec.squeeze(0).cpu(), mbd.sample_rate, strategy="loudness", loudness_compressor=True)
106
+ audio_write('sample_diffusion', compressed_diffusion.squeeze(0).cpu(), mbd.sample_rate, strategy="loudness", loudness_compressor=True)
107
+ ```
108
+
109
+
110
+ ## Training
111
+
112
+ A [DiffusionSolver](https://github.com/facebookresearch/audiocraft/tree/main/audiocraft/solvers/diffusion.py) implements Meta diffusion training pipeline.
113
+ It generates waveform audio conditioned on the embeddings extracted from a pre-trained EnCodec model
114
+ (see [EnCodec documentation from the AudioCraft library](https://github.com/facebookresearch/audiocraft/tree/main/ENCODEC.md) for more details on how to train such model).
115
+
116
+ Note that **the library do NOT provide any of the datasets** used for training our diffusion models.
117
+ We provide a dummy dataset containing just a few examples for illustrative purposes.
118
+
119
+ ### Example configurations and grids
120
+
121
+ One can train diffusion models as described in the paper by using this [dora grid](https://github.com/facebookresearch/audiocraft/tree/main/audiocraft/grids/diffusion/4_bands_base_32khz.py).
122
+ ```shell
123
+ # 4 bands MBD trainning
124
+ dora grid diffusion.4_bands_base_32khz
125
+ ```
126
+
127
+ ### Learn more
128
+
129
+ Learn more about AudioCraft training pipelines in the [dedicated section](https://github.com/facebookresearch/audiocraft/tree/main/TRAINING.md).
130
+
131
+
132
+
133
+ ## Citation
134
+
135
+ ```
136
+ @article{sanroman2023fromdi,
137
+ title={From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion},
138
+ author={San Roman, Robin and Adi, Yossi and Deleforge, Antoine and Serizel, Romain and Synnaeve, Gabriel and Défossez, Alexandre},
139
+ journal={arXiv preprint arXiv:},
140
+ year={2023}
141
+ }
142
+ ```
143
+
144
+
145
+ [arxiv]: https://dl.fbaipublicfiles.com/encodec/Diffusion/paper.pdf
146
+ [mbd_samples]: https://ai.honu.io/papers/mbd/