ShoukanLabs
/

Vokan

Text-to-Speech

English

Model card Files Files and versions Community

Korakoe commited on Mar 20

Commit

097c9df

•

1 Parent(s): 783509b

Upload 2 files

Browse files

Files changed (2) hide show

README.md +92 -0
Vokan.gif +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,95 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---
+<style>
+  .TitleContainer {
+    background-color: #ffff;
+    margin-bottom: 0rem;
+    margin-left: auto;
+    margin-right: auto;
+    width: 40%;
+    height: 30%;
+    border-radius: 10rem;
+    border: 0.5vw solid #ff593e;
+    transition: .6s;
+  }
+  .TitleContainer:hover {
+    transform: scale(1.05);
+  }
+  .VokanLogo {
+    margin: auto;
+    display: block;
+  }
+</style>
+<hr>
+<div class="TitleContainer" align="center">
+      <!--<img src="https://huggingface.co/ShoukanLabs/Vokan/resolve/main/Vokan.gif" class="VokanLogo">-->
+      <img src="https://cdn.discordapp.com/attachments/1052446505790865439/1219207053067948072/Vokan.gif?ex=660a760d&is=65f8010d&hm=81a3f63fb38f65ed641ffcd13d162de76047ea26bbacc8b0d7325e88e2c4d59a&" class="VokanLogo">
+</div>
+<p align="center", style="font-size: 1vw; font-weight: bold; color: #ff593e;">A StyleTTS2 fine-tune, designed for expressiveness.</p>
+<hr>
+Vokan features:
+- A diverse dataset for a more authentic zero-shot performance
+- Training on 6+ days worth of audio, with 672 diverse and expressive speakers
+- Training on 1x H100 for 300 hours and 1x 3090 for an additional 600 hours
+### Audio Examples
+<audio controls> <source src="" type="audio/wav"> Your browser does not support the audio embed. </audio>
+### Demo Spaces
+Coming soon...
+## This model was made possible thanks to
+- [DagsHub](https://dagshub.com) who sponsored us with their GPU compute (with special thanks to Dean!)
+- And the assistance from [camenduru](https://github.com/camenduru) on cloud infrastructure and model training
+<hr>
+<a href="https://discord.gg/5bq9HqVhsJ"><img src="https://img.shields.io/badge/find_us_at_the-ShoukanLabs_Discord-invite?style=flat-square&logo=discord&logoColor=%23ffffff&labelColor=%235865F2&color=%23ffffff" width="320" alt="discord"></a>
+<!--<a align="left" style="font-size: 1.3rem; font-weight: bold; color: #5662f6;" href="https://discord.gg/5bq9HqVhsJ">find us on Discord</a>-->
+## Citations
+```citations
+@misc{li2023styletts,
+      title={StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models},
+      author={Yinghao Aaron Li and Cong Han and Vinay S. Raghavan and Gavin Mischler and Nima Mesgarani},
+      year={2023},
+      eprint={2306.07691},
+      archivePrefix={arXiv},
+      primaryClass={eess.AS}
+}
+@misc{zen2019libritts,
+      title={LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech},
+      author={Heiga Zen and Viet Dang and Rob Clark and Yu Zhang and Ron J. Weiss and Ye Jia and Zhifeng Chen and Yonghui Wu},
+      year={2019},
+      eprint={1904.02882},
+      archivePrefix={arXiv},
+      primaryClass={cs.SD}
+}
+Christophe Veaux,  Junichi Yamagishi, Kirsten MacDonald,
+"CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit",
+The Centre for Speech Technology Research (CSTR),
+University of Edinburgh
+```
+## License
+```
+MIT
+```
+Stay tuned for Vokan V2!

Vokan.gif ADDED Viewed