Text-to-Speech
English
Korakoe commited on
Commit
097c9df
1 Parent(s): 783509b

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +92 -0
  2. Vokan.gif +0 -0
README.md CHANGED
@@ -1,3 +1,95 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ <style>
6
+
7
+ .TitleContainer {
8
+ background-color: #ffff;
9
+ margin-bottom: 0rem;
10
+ margin-left: auto;
11
+ margin-right: auto;
12
+ width: 40%;
13
+ height: 30%;
14
+ border-radius: 10rem;
15
+ border: 0.5vw solid #ff593e;
16
+ transition: .6s;
17
+ }
18
+
19
+ .TitleContainer:hover {
20
+ transform: scale(1.05);
21
+ }
22
+
23
+ .VokanLogo {
24
+ margin: auto;
25
+ display: block;
26
+ }
27
+
28
+ </style>
29
+
30
+ <hr>
31
+
32
+ <div class="TitleContainer" align="center">
33
+ <!--<img src="https://huggingface.co/ShoukanLabs/Vokan/resolve/main/Vokan.gif" class="VokanLogo">-->
34
+ <img src="https://cdn.discordapp.com/attachments/1052446505790865439/1219207053067948072/Vokan.gif?ex=660a760d&is=65f8010d&hm=81a3f63fb38f65ed641ffcd13d162de76047ea26bbacc8b0d7325e88e2c4d59a&" class="VokanLogo">
35
+ </div>
36
+
37
+ <p align="center", style="font-size: 1vw; font-weight: bold; color: #ff593e;">A StyleTTS2 fine-tune, designed for expressiveness.</p>
38
+
39
+ <hr>
40
+
41
+ Vokan features:
42
+
43
+ - A diverse dataset for a more authentic zero-shot performance
44
+ - Training on 6+ days worth of audio, with 672 diverse and expressive speakers
45
+ - Training on 1x H100 for 300 hours and 1x 3090 for an additional 600 hours
46
+
47
+ ### Audio Examples
48
+
49
+ <audio controls> <source src="" type="audio/wav"> Your browser does not support the audio embed. </audio>
50
+
51
+ ### Demo Spaces
52
+ Coming soon...
53
+
54
+ ## This model was made possible thanks to
55
+ - [DagsHub](https://dagshub.com) who sponsored us with their GPU compute (with special thanks to Dean!)
56
+ - And the assistance from [camenduru](https://github.com/camenduru) on cloud infrastructure and model training
57
+
58
+ <hr>
59
+
60
+ <a href="https://discord.gg/5bq9HqVhsJ"><img src="https://img.shields.io/badge/find_us_at_the-ShoukanLabs_Discord-invite?style=flat-square&logo=discord&logoColor=%23ffffff&labelColor=%235865F2&color=%23ffffff" width="320" alt="discord"></a>
61
+ <!--<a align="left" style="font-size: 1.3rem; font-weight: bold; color: #5662f6;" href="https://discord.gg/5bq9HqVhsJ">find us on Discord</a>-->
62
+
63
+ ## Citations
64
+
65
+ ```citations
66
+ @misc{li2023styletts,
67
+ title={StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models},
68
+ author={Yinghao Aaron Li and Cong Han and Vinay S. Raghavan and Gavin Mischler and Nima Mesgarani},
69
+ year={2023},
70
+ eprint={2306.07691},
71
+ archivePrefix={arXiv},
72
+ primaryClass={eess.AS}
73
+ }
74
+
75
+ @misc{zen2019libritts,
76
+ title={LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech},
77
+ author={Heiga Zen and Viet Dang and Rob Clark and Yu Zhang and Ron J. Weiss and Ye Jia and Zhifeng Chen and Yonghui Wu},
78
+ year={2019},
79
+ eprint={1904.02882},
80
+ archivePrefix={arXiv},
81
+ primaryClass={cs.SD}
82
+ }
83
+
84
+ Christophe Veaux, Junichi Yamagishi, Kirsten MacDonald,
85
+ "CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit",
86
+ The Centre for Speech Technology Research (CSTR),
87
+ University of Edinburgh
88
+ ```
89
+
90
+ ## License
91
+ ```
92
+ MIT
93
+ ```
94
+
95
+ Stay tuned for Vokan V2!
Vokan.gif ADDED