--- license: mit datasets: - ShoukanLabs/AniSpeech - vctk - blabble-io/libritts_r language: - en pipeline_tag: text-to-speech ---
A StyleTTS2 fine-tune, designed for expressiveness.
Vokan Samples!
Acknowledgements!
- **[DagsHub](https://dagshub.com):** Special thanks to DagsHub for sponsoring GPU compute resources as well as offering an amazing versioning service, enabling efficient model training and development. A shoutout to Dean in particular! - **[camenduru](https://github.com/camenduru):** Thanks to camenduru for their expertise in cloud infrastructure and model training, which played a crucial role in the development of Vokan! Please give them a follow!Conclusion!
V2 is currently in the works, aiming to be bigger and better in every way! Including multilingual support! This is where you come in, if you have any large single speaker datasets you'd like to contribute, in any langauge, you can contribute to our **Vokan dataset**. A large **community dataset** that combines a bunch of smaller single speaker datasets to create one big multispeaker one. You can upload your uberduck or [FakeYou](https://fakeyou.com/) compliant datasets via the **[Vokan](https://huggingface.co/ShoukanLabs/Vokan)** bot on the **[ShoukanLabs Discord Server](https://discord.gg/hdVeretude)**. The more data we have, the better the models we produce will be!Citations!
```citations @misc{li2023styletts, title={StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models}, author={Yinghao Aaron Li and Cong Han and Vinay S. Raghavan and Gavin Mischler and Nima Mesgarani}, year={2023}, eprint={2306.07691}, archivePrefix={arXiv}, primaryClass={eess.AS} } @misc{zen2019libritts, title={LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech}, author={Heiga Zen and Viet Dang and Rob Clark and Yu Zhang and Ron J. Weiss and Ye Jia and Zhifeng Chen and Yonghui Wu}, year={2019}, eprint={1904.02882}, archivePrefix={arXiv}, primaryClass={cs.SD} } Christophe Veaux, Junichi Yamagishi, Kirsten MacDonald, "CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit", The Centre for Speech Technology Research (CSTR), University of Edinburgh ```License!
``` MIT ```