File size: 1,686 Bytes
34f2a94 c72e140 7784ed2 a19f7e7 25e2e23 34f2a94 7784ed2 5d6f6a7 18f3ecd bb65fbe 5d6f6a7 7784ed2 6c68fba cf1c9d7 7784ed2 cf1c9d7 7784ed2 6c68fba 7784ed2 32b5aeb 7784ed2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
---
license: cc-by-nc-4.0
language:
- ru
library_name: nemo
tags:
- text-to-speech
- tts
---
### How to use
See example of inference pipeline for Russian TTS (G2P + FastPitch + HifiGAN) in this [notebook](https://github.com/bene-ges/nemo_compatible/blob/main/notebooks/Russian_TTS_with_IPA_G2P_FastPitch_and_HifiGAN.ipynb).
Or use this [bash-script](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/ru_ipa_fastpitch_hifigan/test.sh).
### Input
This model is indended to be used in a G2P + FastPitch + HifiGAN pipeline (see above).
If run independently, it expects text converted to IPA-like transcriptions. See this [g2p model](https://huggingface.co/bene-ges/ru_g2p_ipa_bert_large) for conversion of plain Russian words to phonemes, or this new [IPA-compatible G2P tool](https://github.com/omogr/omogre) that can handle ambiguitity on sentence level.
If you feed plain text directly, it will work, but quality will be low.
### Output
This model generates mel spectrograms.
## Training
The NeMo toolkit [1] was used for training the model for 1000+ epochs.
Full training script is [here](https://github.com/bene-ges/nemo_compatible/blob/main/scripts/tts/ru_ipa_fastpitch_hifigan/train.sh)
### Datasets
This model is trained on [RUSLAN](https://ruslan-corpus.github.io/) [2] corpus (single speaker, male voice) sampled at 22050Hz.
## References
- [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
- [2] Gabdrakhmanov L., Garaev R., Razinkov E. (2019) RUSLAN: Russian Spoken Language Corpus for Speech Synthesis. In: Salah A., Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science, vol 11658. Springer, Cham |