slSeanWU commited on
Commit
b477383
2 Parent(s): 1e3624b 1ac6165

Merge branch 'main' of https://huggingface.co/slseanwu/compose-and-embellish-pop1k7 into main

Browse files
Files changed (1) hide show
  1. README.md +17 -1
README.md CHANGED
@@ -5,9 +5,10 @@ tags:
5
  - pytorch
6
  - audio
7
  - music
 
8
  license: mit
9
  ---
10
- # Compose & Embellish
11
  Trained model weights and training datasets for the paper:
12
  * Shih-Lun Wu and Yi-Hsuan Yang
13
  "[Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach](https://arxiv.org/abs/2209.08212)."
@@ -17,8 +18,23 @@ Trained model weights and training datasets for the paper:
17
  ### Stage 1: "Compose" model
18
  Generates **melody and chord progression** from scratch.
19
 
 
 
 
 
 
 
 
 
 
 
20
  ### Stage 2: "Embellish" model
21
  Generates **accompaniment, timing and dynamics** conditioned on Stage 1 outputs.
 
 
 
 
 
22
 
23
  ## BibTex
24
  If you find the materials useful, please consider citing our work:
 
5
  - pytorch
6
  - audio
7
  - music
8
+ - piano
9
  license: mit
10
  ---
11
+ # Compose & Embellish: Piano Performance Generation Pipeline
12
  Trained model weights and training datasets for the paper:
13
  * Shih-Lun Wu and Yi-Hsuan Yang
14
  "[Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach](https://arxiv.org/abs/2209.08212)."
 
18
  ### Stage 1: "Compose" model
19
  Generates **melody and chord progression** from scratch.
20
 
21
+ - Model backbone: 12-layer Transformer w/ relative positional encoding
22
+ - Num trainable params: 41.3M
23
+ - Token vocabulary: [Revamped MIDI-derived events](https://arxiv.org/abs/2002.00212) (**REMI**) w/ slight modifications
24
+ - Pretraining dataset: subset of [Lakh MIDI full](https://colinraffel.com/projects/lmd/) (**LMD-full**), 14934 songs
25
+ - melody extraction (and data filtering) done by **matching lyrics to tracks**: https://github.com/gulnazaki/lyrics-melody/blob/main/pre-processing/create_dataset.py
26
+ - structural segmentation done with **A\* search**: https://github.com/Dsqvival/hierarchical-structure-analysis
27
+ - Finetuning dataset: subset of [AILabs.tw Pop1K7](https://github.com/YatingMusic/compound-word-transformer) (**Pop1K7**), 1591 songs
28
+ - melody extraction done with **skyline algorithm**: https://github.com/wazenmai/MIDI-BERT/blob/CP/melody_extraction/skyline/analyzer.py
29
+ - structural segmentation done in the same way as pretraining dataset
30
+ - Training sequence length: 2400
31
  ### Stage 2: "Embellish" model
32
  Generates **accompaniment, timing and dynamics** conditioned on Stage 1 outputs.
33
+ - Model backbone: 12-layer **Performer** ([paper](https://arxiv.org/abs/2009.14794), [implementation](https://github.com/idiap/fast-transformers))
34
+ - Num trainable params: 38.2M
35
+ - Token vocabulary: [Revamped MIDI-derived events](https://arxiv.org/abs/2002.00212) (**REMI**) w/ slight modifications
36
+ - Training dataset: [AILabs.tw Pop1K7](https://github.com/YatingMusic/compound-word-transformer) (**Pop1K7**), 1747 songs
37
+ - Training sequence length: 3072
38
 
39
  ## BibTex
40
  If you find the materials useful, please consider citing our work: