soujanyaporia
commited on
Commit
β’
f3f9f1f
1
Parent(s):
477d4f4
Update README.md
Browse files
README.md
CHANGED
@@ -10,13 +10,17 @@ tags:
|
|
10 |
|
11 |
# Mustango: Toward Controllable Text-to-Music Generation
|
12 |
|
13 |
-
[Demo]() [Model](https://
|
|
|
|
|
14 |
</div>
|
15 |
|
16 |
Meet Mustango, an exciting addition to the vibrant landscape of Multimodal Large Language Models designed for controlled music generation. Mustango leverages Latent Diffusion Model (LDM), Flan-T5, and musical features to do the magic!
|
17 |
|
|
|
|
|
18 |
<div align="center">
|
19 |
-
<img src="mustango.jpg" width="500"/>
|
20 |
</div>
|
21 |
|
22 |
|
@@ -38,12 +42,22 @@ sf.write(f"{prompt}.wav", audio, samplerate=16000)
|
|
38 |
IPython.display.Audio(data=audio, rate=16000)
|
39 |
```
|
40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
## Datasets
|
42 |
|
43 |
The [MusicBench](https://huggingface.co/datasets/amaai-lab/MusicBench) dataset contains 52k music fragments with a rich music-specific text caption.
|
44 |
## Subjective Evaluation by Expert Listeners
|
45 |
|
46 |
-
| **Model** | **Dataset** | **Pre-trained** | **
|
47 |
|-----------|-------------|:-----------------:|:-----------:|:-----------:|:-----------:|:----------:|:----------:|:----------:|:----------:|
|
48 |
| Tango | MusicCaps | β | 4.35 | 2.75 | 3.88 | 3.35 | 2.83 | 3.95 | 3.84 |
|
49 |
| Tango | MusicBench | β | 4.91 | 3.61 | 3.86 | 3.88 | 3.54 | 4.01 | 4.34 |
|
|
|
10 |
|
11 |
# Mustango: Toward Controllable Text-to-Music Generation
|
12 |
|
13 |
+
[Demo](https://replicate.com/declare-lab/mustango) | [Model](https://huggingface.co/declare-lab/mustango) | [Website and Examples](https://amaai-lab.github.io/mustango/) | [Paper](https://arxiv.org/abs/2311.08355) | [Dataset](https://huggingface.co/datasets/amaai-lab/MusicBench)
|
14 |
+
|
15 |
+
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/declare-lab/mustango)
|
16 |
</div>
|
17 |
|
18 |
Meet Mustango, an exciting addition to the vibrant landscape of Multimodal Large Language Models designed for controlled music generation. Mustango leverages Latent Diffusion Model (LDM), Flan-T5, and musical features to do the magic!
|
19 |
|
20 |
+
π₯ Live demo available on [Replicate](https://replicate.com/declare-lab/mustango) and [HuggingFace](https://huggingface.co/spaces/declare-lab/mustango).
|
21 |
+
|
22 |
<div align="center">
|
23 |
+
<img src="img/mustango.jpg" width="500"/>
|
24 |
</div>
|
25 |
|
26 |
|
|
|
42 |
IPython.display.Audio(data=audio, rate=16000)
|
43 |
```
|
44 |
|
45 |
+
## Installation
|
46 |
+
|
47 |
+
```bash
|
48 |
+
git clone https://github.com/AMAAI-Lab/mustango
|
49 |
+
cd mustango
|
50 |
+
pip install -r requirements.txt
|
51 |
+
cd diffusers
|
52 |
+
pip install -e .
|
53 |
+
```
|
54 |
+
|
55 |
## Datasets
|
56 |
|
57 |
The [MusicBench](https://huggingface.co/datasets/amaai-lab/MusicBench) dataset contains 52k music fragments with a rich music-specific text caption.
|
58 |
## Subjective Evaluation by Expert Listeners
|
59 |
|
60 |
+
| **Model** | **Dataset** | **Pre-trained** | **Overall Match** β | **Chord Match** β | **Tempo Match** β | **Audio Quality** β | **Musicality** β | **Rhythmic Presence and Stability** β | **Harmony and Consonance** β |
|
61 |
|-----------|-------------|:-----------------:|:-----------:|:-----------:|:-----------:|:----------:|:----------:|:----------:|:----------:|
|
62 |
| Tango | MusicCaps | β | 4.35 | 2.75 | 3.88 | 3.35 | 2.83 | 3.95 | 3.84 |
|
63 |
| Tango | MusicBench | β | 4.91 | 3.61 | 3.86 | 3.88 | 3.54 | 4.01 | 4.34 |
|