X-UMX trained with the jaCappella corpus for vocal ensemble separation

This model was trained by Tomohiko Nakamura using the codebase).
It was trained on the vocal ensemble separation task of the jaCappella dataset.
The paper was published in ICASSP 2023 (arXiv).

License

See the jaCappella dataset page.

Citation

See the jaCappella dataset page.

Configuration

data:
  num_workers: 12
  sample_rate: 48000
  samples_per_track: 13
  seed: 42
  seq_dur: 6.0
  source_augmentations:
  - gain
  sources:
  - vocal_percussion
  - bass
  - alto
  - tenor
  - soprano
  - lead_vocal
model:
  bandwidth: 16000
  bidirectional: true
  hidden_size: 512
  in_chan: 4096
  nb_channels: 1
  nhop: 1024
  pretrained: null
  spec_power: 1
  window_length: 4096
optim:
  lr: 0.001
  lr_decay_gamma: 0.3
  lr_decay_patience: 80
  optimizer: adam
  patience: 1000
  weight_decay: 1.0e-05
training:
  batch_size: 16
  epochs: 1000
  loss_combine_sources: true
  loss_use_multidomain: true
  mix_coef: 10.0
  val_dur: 80.0

Results (SI-SDR [dB]) on vocal ensemble separation

Method	Lead vocal	Soprano	Alto	Tenor	Bass	Vocal percussion
X-UMX	7.5	10.7	13.5	10.2	9.1	21.0