Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ tags:
|
|
5 |
- audio-to-audio
|
6 |
language: en
|
7 |
datasets:
|
8 |
-
-
|
9 |
license: cc-by-4.0
|
10 |
---
|
11 |
|
@@ -13,21 +13,67 @@ license: cc-by-4.0
|
|
13 |
|
14 |
### `wyz/vctk_bsrnn_large_double_causal`
|
15 |
|
16 |
-
This model was trained by
|
17 |
|
18 |
### Demo: How to use in ESPnet2
|
19 |
|
20 |
Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
|
21 |
if you haven't done that already.
|
22 |
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
```
|
30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
|
33 |
## ENH config
|
|
|
5 |
- audio-to-audio
|
6 |
language: en
|
7 |
datasets:
|
8 |
+
- VCTK_DEMAND
|
9 |
license: cc-by-4.0
|
10 |
---
|
11 |
|
|
|
13 |
|
14 |
### `wyz/vctk_bsrnn_large_double_causal`
|
15 |
|
16 |
+
This model was trained by wyz based on the universal_se_v1 recipe in [espnet](https://github.com/espnet/espnet/).
|
17 |
|
18 |
### Demo: How to use in ESPnet2
|
19 |
|
20 |
Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
|
21 |
if you haven't done that already.
|
22 |
|
23 |
+
To use the model in the Python interface, you could use the following code:
|
24 |
+
|
25 |
+
```python
|
26 |
+
import soundfile as sf
|
27 |
+
from espnet2.bin.enh_inference import SeparateSpeech
|
28 |
+
|
29 |
+
# For model downloading + loading
|
30 |
+
model = SeparateSpeech.from_pretrained(
|
31 |
+
model_tag="wyz/vctk_bsrnn_large_double_causal",
|
32 |
+
normalize_output_wav=True,
|
33 |
+
device="cuda",
|
34 |
+
)
|
35 |
+
# For loading a downloaded model
|
36 |
+
# model = SeparateSpeech(
|
37 |
+
# train_config="exp_vctk/enh_train_enh_bsrnn_large_double_raw/config.yaml",
|
38 |
+
# model_file="exp_vctk/enh_train_enh_bsrnn_large_double_raw/xxxx.pth",
|
39 |
+
# normalize_output_wav=True,
|
40 |
+
# device="cuda",
|
41 |
+
# )
|
42 |
+
|
43 |
+
audio, fs = sf.read("/path/to/noisy/utt1.flac")
|
44 |
+
enhanced = model(audio[None, :], fs=fs)[0]
|
45 |
```
|
46 |
|
47 |
+
<!-- Generated by ./scripts/utils/show_enh_score.sh -->
|
48 |
+
# RESULTS
|
49 |
+
## Environments
|
50 |
+
- date: `Wed Feb 28 12:11:05 EST 2024`
|
51 |
+
- python version: `3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]`
|
52 |
+
- espnet version: `espnet 202304`
|
53 |
+
- pytorch version: `pytorch 2.0.1+cu118`
|
54 |
+
- Git hash: `443028662106472c60fe8bd892cb277e5b488651`
|
55 |
+
- Commit date: `Thu May 11 03:32:59 2023 +0000`
|
56 |
+
|
57 |
+
|
58 |
+
## enhanced_test_16k
|
59 |
+
|
60 |
+
|
61 |
+
|dataset|PESQ_WB|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
|
62 |
+
|---|---|---|---|---|---|---|---|---|---|---|
|
63 |
+
|chime4_et05_real_isolated_6ch_track|1.13|46.06|-4.10|-4.10|0.00|-31.50|2.32|2.75|3.26|3.07|
|
64 |
+
|chime4_et05_simu_isolated_6ch_track|1.14|63.98|3.58|3.58|0.00|-2.04|2.19|2.54|3.36|2.75|
|
65 |
+
|dns20_tt_synthetic_no_reverb|2.18|93.14|13.28|13.28|0.00|12.47|3.05|3.46|3.71|3.73|
|
66 |
+
|reverb_et_real_8ch_multich|1.10|59.67|3.75|3.75|0.00|0.44|2.28|2.62|3.50|3.27|
|
67 |
+
|reverb_et_simu_8ch_multich|1.61|83.19|9.07|9.07|0.00|-10.76|2.84|3.23|3.71|3.62|
|
68 |
+
|whamr_tt_mix_single_reverb_max_16k|1.24|76.23|4.73|4.73|0.00|0.59|2.32|2.66|3.53|3.19|
|
69 |
+
|
70 |
+
|
71 |
+
## enhanced_test_48k
|
72 |
+
|
73 |
+
|
74 |
+
|dataset|STOI|SAR|SDR|SIR|SI_SNR|OVRL|SIG|BAK|P808_MOS|
|
75 |
+
|---|---|---|---|---|---|---|---|---|---|
|
76 |
+
|vctk_noisy_tt_2spk|94.80|19.49|19.49|0.00|18.45|3.09|3.42|3.92|3.48|
|
77 |
|
78 |
|
79 |
## ENH config
|