slseanwu
/

compose-and-embellish-pop1k7

music-generation

Model card Files Files and versions Community

slseanwu commited on Mar 5, 2023

Commit

2fd8c86

•

1 Parent(s): f616fc5

complete model characteristics

Files changed (1) hide show

README.md +15 -0

README.md CHANGED Viewed

@@ -17,8 +17,23 @@ Trained model weights and training datasets for the paper:
 ### Stage 1: "Compose" model
 Generates **melody and chord progression** from scratch.
 ### Stage 2: "Embellish" model
 Generates **accompaniment, timing and dynamics** conditioned on Stage 1 outputs.
 ## BibTex
 If you find the materials useful, please consider citing our work:

 ### Stage 1: "Compose" model
 Generates **melody and chord progression** from scratch.
+  - Model backbone: 12-layer Transformer w/ relative positional encoding
+  - Num trainable params: 41.3M
+  - Token vocabulary: [Revamped MIDI-derived events](https://arxiv.org/abs/2002.00212) (**REMI**) w/ slight modifications
+  - Pretraining dataset: subset of [Lakh MIDI full](https://colinraffel.com/projects/lmd/) (**LMD-full**), 14934 songs
+    - melody extraction (and data filtering) done by **matching lyrics to tracks**: https://github.com/gulnazaki/lyrics-melody/blob/main/pre-processing/create_dataset.py
+    - structural segmentation done with **A\* search**: https://github.com/Dsqvival/hierarchical-structure-analysis
+  - Finetuning dataset: subset of [AILabs.tw Pop1K7](https://github.com/YatingMusic/compound-word-transformer) (**Pop1K7**), 1591 songs
+    - melody extraction done with **skyline algorithm**: https://github.com/wazenmai/MIDI-BERT/blob/CP/melody_extraction/skyline/analyzer.py
+    - structural segmentation done in the same way as pretraining dataset
+  - Training sequence length: 2400
 ### Stage 2: "Embellish" model
 Generates **accompaniment, timing and dynamics** conditioned on Stage 1 outputs.
+  - Model backbone: 12-layer **Performer** ([paper](https://arxiv.org/abs/2009.14794), [implementation](https://github.com/idiap/fast-transformers))
+  - Num trainable params: 38.2M
+  - Token vocabulary: [Revamped MIDI-derived events](https://arxiv.org/abs/2002.00212) (**REMI**) w/ slight modifications
+  - Training dataset: [AILabs.tw Pop1K7](https://github.com/YatingMusic/compound-word-transformer) (**Pop1K7**), 1747 songs
+  - Training sequence length: 3072
 ## BibTex
 If you find the materials useful, please consider citing our work: