Merge branch 'main' of https://huggingface.co/ShoukanLabs/Vokan
Browse files- Model/config.yml +1 -1
- README.md +6 -3
Model/config.yml
CHANGED
@@ -62,7 +62,7 @@ model_params:
|
|
62 |
dist:
|
63 |
estimate_sigma_data: true
|
64 |
mean: -3
|
65 |
-
sigma_data: .
|
66 |
std: 1
|
67 |
embedding_mask_proba: 0.1
|
68 |
transformer:
|
|
|
62 |
dist:
|
63 |
estimate_sigma_data: true
|
64 |
mean: -3
|
65 |
+
sigma_data: .18
|
66 |
std: 1
|
67 |
embedding_mask_proba: 0.1
|
68 |
transformer:
|
README.md
CHANGED
@@ -7,6 +7,7 @@ datasets:
|
|
7 |
language:
|
8 |
- en
|
9 |
pipeline_tag: text-to-speech
|
|
|
10 |
---
|
11 |
|
12 |
<style>
|
@@ -61,7 +62,7 @@ pipeline_tag: text-to-speech
|
|
61 |
</div>
|
62 |
|
63 |
**Vokan** is an advanced finetuned **StyleTTS2** model crafted for authentic and expressive zero-shot performance. Designed to serve as a better
|
64 |
-
base model
|
65 |
It leverages a diverse dataset and extensive training to generate high-quality synthesized speech.
|
66 |
Trained on a combination of the AniSpeech, VCTK, and LibriTTS-R datasets, Vokan ensures authenticity and naturalness across various accents and contexts.
|
67 |
With over 6+ days worth of audio data and 672 diverse and expressive speakers,
|
@@ -116,11 +117,13 @@ You can read more about it on our article on [DagsHub!](https://dagshub.com/blog
|
|
116 |
|
117 |
V2 is currently in the works, aiming to be bigger and better in every way! Including multilingual support!
|
118 |
This is where you come in, if you have any large single speaker datasets you'd like to contribute,
|
119 |
-
in any
|
120 |
smaller single speaker datasets to create one big multispeaker one.
|
121 |
-
You can upload your uberduck or
|
122 |
**[Vokan](https://huggingface.co/ShoukanLabs/Vokan)** bot on the **[ShoukanLabs Discord Server](https://discord.gg/hdVeretude)**.
|
123 |
The more data we have, the better the models we produce will be!
|
|
|
|
|
124 |
<hr>
|
125 |
|
126 |
<p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">Citations!</p>
|
|
|
7 |
language:
|
8 |
- en
|
9 |
pipeline_tag: text-to-speech
|
10 |
+
base_model: yl4579/StyleTTS2-LibriTTS
|
11 |
---
|
12 |
|
13 |
<style>
|
|
|
62 |
</div>
|
63 |
|
64 |
**Vokan** is an advanced finetuned **StyleTTS2** model crafted for authentic and expressive zero-shot performance. Designed to serve as a better
|
65 |
+
base model for further finetuning in the future!
|
66 |
It leverages a diverse dataset and extensive training to generate high-quality synthesized speech.
|
67 |
Trained on a combination of the AniSpeech, VCTK, and LibriTTS-R datasets, Vokan ensures authenticity and naturalness across various accents and contexts.
|
68 |
With over 6+ days worth of audio data and 672 diverse and expressive speakers,
|
|
|
117 |
|
118 |
V2 is currently in the works, aiming to be bigger and better in every way! Including multilingual support!
|
119 |
This is where you come in, if you have any large single speaker datasets you'd like to contribute,
|
120 |
+
in any language, you can contribute to our **Vokan dataset**. A large **community dataset** that combines a bunch of
|
121 |
smaller single speaker datasets to create one big multispeaker one.
|
122 |
+
You can upload your uberduck or FakeYou compliant datasets via the
|
123 |
**[Vokan](https://huggingface.co/ShoukanLabs/Vokan)** bot on the **[ShoukanLabs Discord Server](https://discord.gg/hdVeretude)**.
|
124 |
The more data we have, the better the models we produce will be!
|
125 |
+
|
126 |
+
[This model is also available on DagsHub](https://dagshub.com/ShoukanLabs/Vokan)
|
127 |
<hr>
|
128 |
|
129 |
<p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">Citations!</p>
|