ShoukanLabs
/

Vokan

Text-to-Speech

English

Model card Files Files and versions Community

ButterCream commited on Jun 15

Commit

0d5fcee

•

2 Parent(s): 915a9a9 0c6aaeb

Merge branch 'main' of https://huggingface.co/ShoukanLabs/Vokan

Browse files

Files changed (2) hide show

Model/config.yml +1 -1
README.md +6 -3

Model/config.yml CHANGED Viewed

@@ -62,7 +62,7 @@ model_params:
     dist:
       estimate_sigma_data: true
       mean: -3
-      sigma_data: .nan
       std: 1
     embedding_mask_proba: 0.1
     transformer:

     dist:
       estimate_sigma_data: true
       mean: -3
+      sigma_data: .18
       std: 1
     embedding_mask_proba: 0.1
     transformer:

README.md CHANGED Viewed

@@ -7,6 +7,7 @@ datasets:
 language:
 - en
 pipeline_tag: text-to-speech
 ---
 <style>
@@ -61,7 +62,7 @@ pipeline_tag: text-to-speech
 </div>
 **Vokan** is an advanced finetuned **StyleTTS2** model crafted for authentic and expressive zero-shot performance. Designed to serve as a better
-base model fo further finetuning in the future!
 It leverages a diverse dataset and extensive training to generate high-quality synthesized speech.
 Trained on a combination of the AniSpeech, VCTK, and LibriTTS-R datasets, Vokan ensures authenticity and naturalness across various accents and contexts.
 With over 6+ days worth of audio data and 672 diverse and expressive speakers,
@@ -116,11 +117,13 @@ You can read more about it on our article on [DagsHub!](https://dagshub.com/blog
 V2 is currently in the works, aiming to be bigger and better in every way! Including multilingual support!
 This is where you come in, if you have any large single speaker datasets you'd like to contribute,
-in any langauge, you can contribute to our **Vokan dataset**. A large **community dataset** that combines a bunch of
 smaller single speaker datasets to create one big multispeaker one.
-You can upload your uberduck or [FakeYou](https://fakeyou.com/) compliant datasets via the
 **[Vokan](https://huggingface.co/ShoukanLabs/Vokan)** bot on the **[ShoukanLabs Discord Server](https://discord.gg/hdVeretude)**.
 The more data we have, the better the models we produce will be!
 <hr>
 <p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">Citations!</p>

 language:
 - en
 pipeline_tag: text-to-speech
+base_model: yl4579/StyleTTS2-LibriTTS
 ---
 <style>
 </div>
 **Vokan** is an advanced finetuned **StyleTTS2** model crafted for authentic and expressive zero-shot performance. Designed to serve as a better
+base model for further finetuning in the future!
 It leverages a diverse dataset and extensive training to generate high-quality synthesized speech.
 Trained on a combination of the AniSpeech, VCTK, and LibriTTS-R datasets, Vokan ensures authenticity and naturalness across various accents and contexts.
 With over 6+ days worth of audio data and 672 diverse and expressive speakers,
 V2 is currently in the works, aiming to be bigger and better in every way! Including multilingual support!
 This is where you come in, if you have any large single speaker datasets you'd like to contribute,
+in any language, you can contribute to our **Vokan dataset**. A large **community dataset** that combines a bunch of
 smaller single speaker datasets to create one big multispeaker one.
+You can upload your uberduck or FakeYou compliant datasets via the
 **[Vokan](https://huggingface.co/ShoukanLabs/Vokan)** bot on the **[ShoukanLabs Discord Server](https://discord.gg/hdVeretude)**.
 The more data we have, the better the models we produce will be!
+[This model is also available on DagsHub](https://dagshub.com/ShoukanLabs/Vokan)
 <hr>
 <p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">Citations!</p>