Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,42 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
language:
|
4 |
+
- ja
|
5 |
+
tags:
|
6 |
+
- music
|
7 |
+
- audio
|
8 |
+
- audio-to-audio
|
9 |
+
- SFI
|
10 |
+
datasets:
|
11 |
+
- MUSDB18-HQ
|
12 |
+
metrics:
|
13 |
+
- SDR
|
14 |
---
|
15 |
+
|
16 |
+
# Sampling-frequency-independent (SFI) Conv-TasNet trained with the MUSDB18-HQ dataset for music source separation
|
17 |
+
This model was proposed in [our IEEE/ACM Trans. ASLP paper](https://doi.org/10.1109/TASLP.2022.3203907) and works well with untrained sampling frequencies by using sampling-frequency-independent convolutional layers with the time domain filter design.
|
18 |
+
The latent analog filter is a multiphase gammatone filter.
|
19 |
+
It was trained by Tomohiko Nakamura using [the codebase](https://github.com/TomohikoNakamura/sfi_convtasnet)).
|
20 |
+
This model was trained with 32 kHz-sampled data but works well with untrained sampling frequencies (e.g., 8, 16 kHz).
|
21 |
+
|
22 |
+
# License
|
23 |
+
MIT
|
24 |
+
|
25 |
+
# Citation
|
26 |
+
Please cite the following paper.
|
27 |
+
```
|
28 |
+
@article{KSaito2022IEEEACMTASLP,
|
29 |
+
author={Saito, Koichi and Nakamura, Tomohiko and Yatabe, Kohei and Saruwatari, Hiroshi},
|
30 |
+
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
|
31 |
+
title = {Sampling-frequency-independent convolutional layer and its application to audio source separation},
|
32 |
+
year=2022,
|
33 |
+
month=sep,
|
34 |
+
volume=30,
|
35 |
+
pages={2928--2943},
|
36 |
+
doi={10.1109/TASLP.2022.3203907},
|
37 |
+
}
|
38 |
+
```
|
39 |
+
|
40 |
+
# Contents
|
41 |
+
- Four trained models (seed=40,42,44,47)
|
42 |
+
- Evaluation results (json files obtained with the museval library)
|