Amphion Vocoder Pretrained Models

We provide a DiffWave pretrained checkpoint, which is trained on 125 hours of speech data and 80 hours of singing voice data.

Quick Start

To utilize these pretrained vocoders, just run the following commands:

Step1: Download the checkpoint

git lfs install
git clone https://huggingface.co/amphion/diffwave

Step2: Clone the Amphion's Source Code of GitHub

git clone https://github.com/open-mmlab/Amphion.git

Step3: Specify the checkpoint's path

Use the soft link to specify the downloaded checkpoint in the first step:

cd Amphion
mkdir -p ckpts/vocoder
ln -s "$(realpath ../diffwave/diffwave)" pretrained/diffwave

Step4: Inference

For analysis synthesis on the processed dataset, raw waveform, or predicted mel spectrograms, you can follow the inference part of this recipe.

sh egs/vocoder/diffusion/diffwave/run.sh --stage 3 \
    --infer_mode [Your chosen inference mode] \
    --infer_datasets [Datasets you want to inference, needed when infer_from_dataset] \
    --infer_feature_dir [Your path to your predicted acoustic features, needed when infer_from_feature] \
    --infer_audio_dir [Your path to your audio files, needed when infer_form_audio] \
    --infer_expt_dir Amphion/ckpts/vocoder/[YourExptName] \
    --infer_output_dir Amphion/ckpts/vocoder/[YourExptName]/result \