UTDUSS-Vocoder / README.md
Wataru's picture
Update README.md
eba1cf2 verified
|
raw
history blame
1.75 kB
metadata
license: cc-by-nc-4.0

UTDUSS vocodder model

In this repo, we provide model weight of the descript audio codec used for the Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge

Prerequesties

official dac library which can be installed with the following command.

pip install descript-audio-codec

Provided weights

Vocoder task

model name on paper model name on this repo
πŸ˜€ expresso_16k_2code.pth
πŸ˜€ w/o hyper-parameter tuning expresso_16k_2code_official.pth
πŸ˜€ w/o data exclusion expresso_16k_2code_wo_data.pth
πŸ˜€ w/o matching sampling rate expresso_24k_2code_ab.pth

Acoustic +Vocoder (TTS) task

Please note that the weight for acoustic model is not provided.

model name on paper model name on this repo
πŸ˜€ expresso_16k_2code.pth
πŸ˜€ w/o hyper-parameter tuning expresso_16k_2code_official.pth
πŸ˜€ w/o data exclusion expresso_16k_2code_wo_data.pth
πŸ˜€ w/o matching sampling rate expresso_24k_2code_ab.pth

Sample code

import dac
import torch
from pathlib import Path
model_url = "https://huggingface.co/sarulab-speech/UTDUSS-Vocoder/resolve/main/expresso_16k_2code.pth"
model_path = Path(f"/tmp/utduss/{model_url.split('/')[-1]}")
model_path.parent.mkdir(parents=True,exist_ok=True)
torch.hub.download_url_to_file(model_url,model_path)
model = dac.DAC.load(model_path)

Contributors