Upload 2 files
Browse files
README.md
CHANGED
@@ -1,3 +1,95 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
<style>
|
6 |
+
|
7 |
+
.TitleContainer {
|
8 |
+
background-color: #ffff;
|
9 |
+
margin-bottom: 0rem;
|
10 |
+
margin-left: auto;
|
11 |
+
margin-right: auto;
|
12 |
+
width: 40%;
|
13 |
+
height: 30%;
|
14 |
+
border-radius: 10rem;
|
15 |
+
border: 0.5vw solid #ff593e;
|
16 |
+
transition: .6s;
|
17 |
+
}
|
18 |
+
|
19 |
+
.TitleContainer:hover {
|
20 |
+
transform: scale(1.05);
|
21 |
+
}
|
22 |
+
|
23 |
+
.VokanLogo {
|
24 |
+
margin: auto;
|
25 |
+
display: block;
|
26 |
+
}
|
27 |
+
|
28 |
+
</style>
|
29 |
+
|
30 |
+
<hr>
|
31 |
+
|
32 |
+
<div class="TitleContainer" align="center">
|
33 |
+
<!--<img src="https://huggingface.co/ShoukanLabs/Vokan/resolve/main/Vokan.gif" class="VokanLogo">-->
|
34 |
+
<img src="https://cdn.discordapp.com/attachments/1052446505790865439/1219207053067948072/Vokan.gif?ex=660a760d&is=65f8010d&hm=81a3f63fb38f65ed641ffcd13d162de76047ea26bbacc8b0d7325e88e2c4d59a&" class="VokanLogo">
|
35 |
+
</div>
|
36 |
+
|
37 |
+
<p align="center", style="font-size: 1vw; font-weight: bold; color: #ff593e;">A StyleTTS2 fine-tune, designed for expressiveness.</p>
|
38 |
+
|
39 |
+
<hr>
|
40 |
+
|
41 |
+
Vokan features:
|
42 |
+
|
43 |
+
- A diverse dataset for a more authentic zero-shot performance
|
44 |
+
- Training on 6+ days worth of audio, with 672 diverse and expressive speakers
|
45 |
+
- Training on 1x H100 for 300 hours and 1x 3090 for an additional 600 hours
|
46 |
+
|
47 |
+
### Audio Examples
|
48 |
+
|
49 |
+
<audio controls> <source src="" type="audio/wav"> Your browser does not support the audio embed. </audio>
|
50 |
+
|
51 |
+
### Demo Spaces
|
52 |
+
Coming soon...
|
53 |
+
|
54 |
+
## This model was made possible thanks to
|
55 |
+
- [DagsHub](https://dagshub.com) who sponsored us with their GPU compute (with special thanks to Dean!)
|
56 |
+
- And the assistance from [camenduru](https://github.com/camenduru) on cloud infrastructure and model training
|
57 |
+
|
58 |
+
<hr>
|
59 |
+
|
60 |
+
<a href="https://discord.gg/5bq9HqVhsJ"><img src="https://img.shields.io/badge/find_us_at_the-ShoukanLabs_Discord-invite?style=flat-square&logo=discord&logoColor=%23ffffff&labelColor=%235865F2&color=%23ffffff" width="320" alt="discord"></a>
|
61 |
+
<!--<a align="left" style="font-size: 1.3rem; font-weight: bold; color: #5662f6;" href="https://discord.gg/5bq9HqVhsJ">find us on Discord</a>-->
|
62 |
+
|
63 |
+
## Citations
|
64 |
+
|
65 |
+
```citations
|
66 |
+
@misc{li2023styletts,
|
67 |
+
title={StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models},
|
68 |
+
author={Yinghao Aaron Li and Cong Han and Vinay S. Raghavan and Gavin Mischler and Nima Mesgarani},
|
69 |
+
year={2023},
|
70 |
+
eprint={2306.07691},
|
71 |
+
archivePrefix={arXiv},
|
72 |
+
primaryClass={eess.AS}
|
73 |
+
}
|
74 |
+
|
75 |
+
@misc{zen2019libritts,
|
76 |
+
title={LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech},
|
77 |
+
author={Heiga Zen and Viet Dang and Rob Clark and Yu Zhang and Ron J. Weiss and Ye Jia and Zhifeng Chen and Yonghui Wu},
|
78 |
+
year={2019},
|
79 |
+
eprint={1904.02882},
|
80 |
+
archivePrefix={arXiv},
|
81 |
+
primaryClass={cs.SD}
|
82 |
+
}
|
83 |
+
|
84 |
+
Christophe Veaux, Junichi Yamagishi, Kirsten MacDonald,
|
85 |
+
"CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit",
|
86 |
+
The Centre for Speech Technology Research (CSTR),
|
87 |
+
University of Edinburgh
|
88 |
+
```
|
89 |
+
|
90 |
+
## License
|
91 |
+
```
|
92 |
+
MIT
|
93 |
+
```
|
94 |
+
|
95 |
+
Stay tuned for Vokan V2!
|
Vokan.gif
ADDED