Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
5.6.0
VCTK
VCTK is an open English speech corpus. We provide examples for building Transformer models on this dataset.
Data preparation
Download data, create splits and generate audio manifests with
python -m examples.speech_synthesis.preprocessing.get_vctk_audio_manifest \
--output-data-root ${AUDIO_DATA_ROOT} \
--output-manifest-root ${AUDIO_MANIFEST_ROOT}
Then, extract log-Mel spectrograms, generate feature manifest and create data configuration YAML with
python -m examples.speech_synthesis.preprocessing.get_feature_manifest \
--audio-manifest-root ${AUDIO_MANIFEST_ROOT} \
--output-root ${FEATURE_MANIFEST_ROOT} \
--ipa-vocab --use-g2p
where we use phoneme inputs (--ipa-vocab --use-g2p
) as example.
To denoise audio and trim leading/trailing silence using signal processing based VAD, run
for SPLIT in dev test train; do
python -m examples.speech_synthesis.preprocessing.denoise_and_vad_audio \
--audio-manifest ${AUDIO_MANIFEST_ROOT}/${SPLIT}.audio.tsv \
--output-dir ${PROCESSED_DATA_ROOT} \
--denoise --vad --vad-agg-level 3
done
Training
(Please refer to the LJSpeech example.)
Inference
(Please refer to the LJSpeech example.)
Automatic Evaluation
(Please refer to the LJSpeech example.)
Results
--arch | Params | Test MCD | Model |
---|---|---|---|
tts_transformer | 54M | 3.4 | Download |