metadata
license: cc-by-nc-4.0
UTDUSS vocodder model
In this repo, we provide model weight of the descript audio codec used for the Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge
Prerequesties
official dac library which can be installed with the following command.
pip install descript-audio-codec
Provided weights
Vocoder task
model name on paper | model name on this repo |
---|---|
π | expresso_16k_2code.pth |
π w/o hyper-parameter tuning | expresso_16k_2code_official.pth |
π w/o data exclusion | expresso_16k_2code_wo_data.pth |
π w/o matching sampling rate | expresso_24k_2code_ab.pth |
Acoustic +Vocoder (TTS) task
Please note that the weight for acoustic model is not provided.
model name on paper | model name on this repo |
---|---|
π | expresso_16k_2code.pth |
π w/o hyper-parameter tuning | expresso_16k_2code_official.pth |
π w/o data exclusion | expresso_16k_2code_wo_data.pth |
π w/o matching sampling rate | expresso_24k_2code_ab.pth |
Sample code
import dac
import torch
from pathlib import Path
model_url = "https://huggingface.co/sarulab-speech/UTDUSS-Vocoder/resolve/main/expresso_16k_2code.pth"
model_path = Path(f"/tmp/utduss/{model_url.split('/')[-1]}")
model_path.parent.mkdir(parents=True,exist_ok=True)
torch.hub.download_url_to_file(model_url,model_path)
model = dac.DAC.load(model_path)