fffiloni commited on
Commit
1a0c6ef
1 Parent(s): df7c985

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -2
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: MusiConGen
3
- emoji: 🚀
4
  colorFrom: green
5
  colorTo: blue
6
  sdk: gradio
@@ -8,5 +8,156 @@ sdk_version: 4.39.0
8
  app_file: app.py
9
  pinned: false
10
  ---
 
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: MusiConGen
3
+ emoji: 🪩
4
  colorFrom: green
5
  colorTo: blue
6
  sdk: gradio
 
8
  app_file: app.py
9
  pinned: false
10
  ---
11
+ # MusiConGen
12
 
13
+
14
+ This is the official implementation of paper: "MusiConGen: Rhythm and chord control for Transformer-based text-to-music generation" in Proc. Int. Society for Music Information Retrieval Conf. (ISMIR), 2024.
15
+
16
+ MusiConGen is based on pretrained [Musicgen](https://github.com/facebookresearch/audiocraft) with additional controls: Rhythm and Chords. The project contains inference, training code and training data (youtube list).
17
+
18
+ <br />
19
+
20
+ [Arxiv Paper]() | [Demo](https://musicongen.github.io/musicongen_demo/)
21
+
22
+ <br />
23
+
24
+ ## Installation
25
+ MusiConGen requires Python 3.9 and PyTorch 2.0.0. You can run:
26
+ ```bash
27
+ pip install -r requirements.txt
28
+ ```
29
+
30
+ We also recommend having `ffmpeg` installed, either through your system or Anaconda:
31
+ ```bash
32
+ sudo apt-get install ffmpeg
33
+ # Or if you are using Anaconda or Miniconda
34
+ conda install 'ffmpeg<5' -c conda-forge
35
+ ```
36
+
37
+ <br />
38
+
39
+ ## Model
40
+ The model is based on the pretrained MusicGen-melody(1.5B). For infernece, GPU with VRAM greater than 12GB is recommended. For training, GPU with VRAM greater than 24GB is recommended.
41
+
42
+ ## Inference
43
+
44
+ First, the model weight is at [link](https://huggingface.co/Cyan0731/MusiConGen/tree/main).
45
+ Move the model weight `compression_state_dict.bin` and `state_dict.bin` to directory `audiocraft/ckpt/musicongen`.
46
+
47
+ One can simply run inference script with the command to generate music with chord and rhythm condition:
48
+ ```shell
49
+ cd audiocraft
50
+ python generate_chord_beat.py
51
+ ```
52
+
53
+ <br />
54
+
55
+
56
+ ## Training
57
+
58
+ ### Training Data
59
+ The training data is provided as json format in 5_genre_songs_list.json. The listed suffixes are for youtube links.
60
+
61
+ ### Data Preprocessing
62
+ Before training, one should put audio data in `audiocraft/dataset/$DIR_OF_YOUR_DATA$/full`.
63
+ And then run the preprocessing step by step:
64
+
65
+ ```shell
66
+ cd preproc
67
+ ```
68
+
69
+ ### 1. demixing tracks
70
+ To remove the vocal stem from the track, we use [Demucs](https://github.com/facebookresearch/demucs).
71
+ In `main.py`, change `path_rootdir` to your directory and `ext_src` to the audio extention of your dataset (`'mp3'` or `'wav'`).
72
+
73
+ ```shell
74
+ cd 0_demix
75
+ python main.py
76
+ ```
77
+
78
+ <br />
79
+
80
+ ### 2. beat/downbeat detection and cropping
81
+ To extract beat and down beat of songs, you can use [BeatNet](https://github.com/mjhydri/BeatNet) or [Madmom](https://github.com/CPJKU/madmom) as the beat extrctor.
82
+ For Beatnet user, change `path_rootdir` to your directory in `main_beat_nn.py`. For Madmom user, change `path_rootdir` to your directory in `main_beat.py`.
83
+
84
+ Then accroding to the extracted beat and downbeat, each song is cropped into clips in `main_crop.py`. `path_rootdir` should also be changed to your dataset directory.
85
+
86
+ The last stage is to filter out the clips with low volumn. `path_rootdir` should be changed to `clip` directory.
87
+
88
+ ```shell
89
+ cd 1_beats-crop
90
+ python main_beat.py
91
+ python main_crop.py
92
+ python main_filter.py
93
+ ```
94
+
95
+ <br />
96
+
97
+ ### 3. chord extraction
98
+ To extract chord progression, we use [BTC-ISMIR2019](https://github.com/jayg996/BTC-ISMIR19).
99
+ The `root_dir` in `main.py` should be changed to your clips data directory.
100
+
101
+ ```shell
102
+ cd 2_chord/BTC-ISMIR19
103
+ python main.py
104
+ ```
105
+
106
+ <br />
107
+
108
+ ### 4. tags/description labeling (optional)
109
+ For dataset crawled from website(e.g. youtube), the description of each song can be obtrained from crawled informaiton `crawl_info.json`(you can change the file name in `3_1_ytjsons2tags/main.py`). We use the title of youtube song as description. The `root_dir` in `main.py` should be changed to your clips data directory.
110
+
111
+ ```shell
112
+ cd 3_1_ytjsons2tags
113
+ python main.py
114
+ ```
115
+
116
+ For dataset without information to describe, you can use [Essentia](https://github.com/MTG/essentia) to extract instrument and genre.
117
+ ```shell
118
+ cd 3_tags/essentia
119
+ python main.py
120
+ ```
121
+
122
+ After json files are created, run `dump_jsonl.py` to generate jsonl file in training directory.
123
+
124
+ <br />
125
+
126
+ ### Training stage
127
+ The training weight of MusiConGen is at [link](https://huggingface.co/Cyan0731/MusiConGen_training/tree/main). Please place it into the directory `MusiConGen/audiocraft/training_weights/xps/musicongen`.
128
+
129
+ Before training, you should set your username in environment variable
130
+ ```shell
131
+ export env USER=$YOUR_USER_NAME
132
+ ```
133
+
134
+ If using single gpu to finetune, you can use the following command:
135
+ ```shell
136
+ dora run solver=musicgen/single_finetune \
137
+ conditioner=chord2music_inattn.yaml \
138
+ continue_from=//sig/musicongen \
139
+ compression_model_checkpoint=//pretrained/facebook/encodec_32khz \
140
+ model/lm/model_scale=medium dset=audio/example \
141
+ transformer_lm.n_q=4 transformer_lm.card=2048
142
+ ```
143
+ the `continue_from` argument can be also provided with your absolute path of your checkpoint.
144
+
145
+ If you are using multiple(4) gpus to finetune, you can use the following command:
146
+ ```shell
147
+ dora run -d solver=musicgen/multigpu_finetune \
148
+ conditioner=chord2music_inattn.yaml \
149
+ continue_from=//sig/musicongen \
150
+ compression_model_checkpoint=//pretrained/facebook/encodec_32khz \
151
+ model/lm/model_scale=medium dset=audio/example \
152
+ transformer_lm.n_q=4 transformer_lm.card=2048
153
+ ```
154
+
155
+ <br />
156
+
157
+ ### export weight
158
+ use `export_weight.py` with your training signature `sig` to export your weight to `output_dir`.
159
+
160
+ <br />
161
+
162
+ ## License
163
+ The license of code and model weights follows the [LICENSE file](https://github.com/Cyan0731/MusiConGen/blob/main/LICENSE), LICENSE of MusicGen in [LICENSE file](https://github.com/facebookresearch/audiocraft/blob/main/LICENSE) and [LICENSE_weights file](https://github.com/facebookresearch/audiocraft/blob/main/LICENSE_weights).