slseanwu's picture
add doc for gpt2 embellish model
fc173db verified
metadata
tags:
  - music-generation
  - transformer
  - pytorch
  - audio
  - music
  - piano
license: mit

Compose & Embellish: Piano Performance Generation Pipeline

Trained model weights and training datasets for the paper:

Note: Materials here should be used in conjunction with our model implementation Github repo.

Model characteristics

Stage 1: "Compose" model

Generates melody and chord progression from scratch.

Stage 2: "Embellish" model

Generates accompaniment, timing and dynamics conditioned on Stage 1 outputs.

  • embellish_model_gpt2_pop1k7_loss0.398.bin
    • Model backbone: 12-layer GPT-2 Transformer (implementation)
    • Num trainable params: 38.2M
  • embellish_model_pop1k7_loss0.399.bin (requires fast-transformers package, which is outdated as of Jul. 2024)
  • Token vocabulary: Revamped MIDI-derived events (REMI) w/ slight modifications
  • Training dataset: AILabs.tw Pop1K7 (Pop1K7), 1747 songs
  • Training sequence length: 3072

BibTex

If you find the materials useful, please consider citing our work:

@inproceedings{wu2023compembellish,
  title={{Compose \& Embellish}: Well-Structured Piano Performance Generation via A Two-Stage Approach},
  author={Wu, Shih-Lun and Yang, Yi-Hsuan},
  booktitle={Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2023},
  url={https://arxiv.org/pdf/2209.08212.pdf}
}