Text-to-Audio
Transformers
music
text-to-music
Inference Endpoints
soujanyaporia commited on
Commit
f3f9f1f
β€’
1 Parent(s): 477d4f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -3
README.md CHANGED
@@ -10,13 +10,17 @@ tags:
10
 
11
  # Mustango: Toward Controllable Text-to-Music Generation
12
 
13
- [Demo]() [Model](https://replicate.com/declare-lab/mustango) [Website and Examples](https://amaai-lab.github.io/mustango/) [Paper](https://arxiv.org/abs/2311.08355) [Dataset](https://huggingface.co/datasets/amaai-lab/MusicBench)
 
 
14
  </div>
15
 
16
  Meet Mustango, an exciting addition to the vibrant landscape of Multimodal Large Language Models designed for controlled music generation. Mustango leverages Latent Diffusion Model (LDM), Flan-T5, and musical features to do the magic!
17
 
 
 
18
  <div align="center">
19
- <img src="mustango.jpg" width="500"/>
20
  </div>
21
 
22
 
@@ -38,12 +42,22 @@ sf.write(f"{prompt}.wav", audio, samplerate=16000)
38
  IPython.display.Audio(data=audio, rate=16000)
39
  ```
40
 
 
 
 
 
 
 
 
 
 
 
41
  ## Datasets
42
 
43
  The [MusicBench](https://huggingface.co/datasets/amaai-lab/MusicBench) dataset contains 52k music fragments with a rich music-specific text caption.
44
  ## Subjective Evaluation by Expert Listeners
45
 
46
- | **Model** | **Dataset** | **Pre-trained** | **Relevance** ↑ | **Chord Match** ↑ | **Tempo Match** ↑ | **Audio Quality** ↑ | **Musicality** ↑ | **Rhythmic Presence and Stability** ↑ | **Harmony and Consonance** ↑ |
47
  |-----------|-------------|:-----------------:|:-----------:|:-----------:|:-----------:|:----------:|:----------:|:----------:|:----------:|
48
  | Tango | MusicCaps | βœ“ | 4.35 | 2.75 | 3.88 | 3.35 | 2.83 | 3.95 | 3.84 |
49
  | Tango | MusicBench | βœ“ | 4.91 | 3.61 | 3.86 | 3.88 | 3.54 | 4.01 | 4.34 |
 
10
 
11
  # Mustango: Toward Controllable Text-to-Music Generation
12
 
13
+ [Demo](https://replicate.com/declare-lab/mustango) | [Model](https://huggingface.co/declare-lab/mustango) | [Website and Examples](https://amaai-lab.github.io/mustango/) | [Paper](https://arxiv.org/abs/2311.08355) | [Dataset](https://huggingface.co/datasets/amaai-lab/MusicBench)
14
+
15
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/declare-lab/mustango)
16
  </div>
17
 
18
  Meet Mustango, an exciting addition to the vibrant landscape of Multimodal Large Language Models designed for controlled music generation. Mustango leverages Latent Diffusion Model (LDM), Flan-T5, and musical features to do the magic!
19
 
20
+ πŸ”₯ Live demo available on [Replicate](https://replicate.com/declare-lab/mustango) and [HuggingFace](https://huggingface.co/spaces/declare-lab/mustango).
21
+
22
  <div align="center">
23
+ <img src="img/mustango.jpg" width="500"/>
24
  </div>
25
 
26
 
 
42
  IPython.display.Audio(data=audio, rate=16000)
43
  ```
44
 
45
+ ## Installation
46
+
47
+ ```bash
48
+ git clone https://github.com/AMAAI-Lab/mustango
49
+ cd mustango
50
+ pip install -r requirements.txt
51
+ cd diffusers
52
+ pip install -e .
53
+ ```
54
+
55
  ## Datasets
56
 
57
  The [MusicBench](https://huggingface.co/datasets/amaai-lab/MusicBench) dataset contains 52k music fragments with a rich music-specific text caption.
58
  ## Subjective Evaluation by Expert Listeners
59
 
60
+ | **Model** | **Dataset** | **Pre-trained** | **Overall Match** ↑ | **Chord Match** ↑ | **Tempo Match** ↑ | **Audio Quality** ↑ | **Musicality** ↑ | **Rhythmic Presence and Stability** ↑ | **Harmony and Consonance** ↑ |
61
  |-----------|-------------|:-----------------:|:-----------:|:-----------:|:-----------:|:----------:|:----------:|:----------:|:----------:|
62
  | Tango | MusicCaps | βœ“ | 4.35 | 2.75 | 3.88 | 3.35 | 2.83 | 3.95 | 3.84 |
63
  | Tango | MusicBench | βœ“ | 4.91 | 3.61 | 3.86 | 3.88 | 3.54 | 4.01 | 4.34 |