bookbot
/

lightspeech-mfa-sw-v4

Model card Files Files and versions Metrics Training metrics Community

w11wo commited on Aug 5

Commit

e089318

•

1 Parent(s): 45b8e55

Updated README

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -9,9 +9,9 @@ tags:
 inference: false
 datasets:
   - bookbot/sw-TZ-Victoria
-  - bookbot/sw-TZ-Victoria-syllables
   - bookbot/sw-TZ-Victoria-v2
-  - bookbot/sw-TZ-VictoriaNeural
 ---
 # LightSpeech MFA SW v4
@@ -19,9 +19,9 @@ datasets:
 LightSpeech MFA SW v4 is a text-to-mel-spectrogram model based on the [LightSpeech](https://arxiv.org/abs/2102.04040) architecture. This model was fine-tuned from [LightSpeech MFA SW v1](https://huggingface.co/bookbot/lightspeech-mfa-sw-v1) and trained on real and synthetic audio datasets. The list of speakers include:
 - sw-TZ-Victoria
-- sw-TZ-Victoria-syllables
 - sw-TZ-Victoria-v2
-- sw-TZ-VictoriaNeural
 We trained an acoustic Swahili model on our speech corpus using [Montreal Forced Aligner v3.0.0](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) and used it as the duration extractor. That model, and consequently our model, uses the IPA phone set for Swahili. We used [gruut](https://github.com/rhasspy/gruut) for phonemization purposes. We followed these [steps](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction) to perform duration extraction.

 inference: false
 datasets:
   - bookbot/sw-TZ-Victoria
+  - bookbot/sw-TZ-Victoria-syllables-word
   - bookbot/sw-TZ-Victoria-v2
+  - bookbot/sw-TZ-VictoriaNeural-upsampled-48kHz
 ---
 # LightSpeech MFA SW v4
 LightSpeech MFA SW v4 is a text-to-mel-spectrogram model based on the [LightSpeech](https://arxiv.org/abs/2102.04040) architecture. This model was fine-tuned from [LightSpeech MFA SW v1](https://huggingface.co/bookbot/lightspeech-mfa-sw-v1) and trained on real and synthetic audio datasets. The list of speakers include:
 - sw-TZ-Victoria
+- sw-TZ-Victoria-syllables-word
 - sw-TZ-Victoria-v2
+- sw-TZ-VictoriaNeural-upsampled-48kHz
 We trained an acoustic Swahili model on our speech corpus using [Montreal Forced Aligner v3.0.0](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) and used it as the duration extractor. That model, and consequently our model, uses the IPA phone set for Swahili. We used [gruut](https://github.com/rhasspy/gruut) for phonemization purposes. We followed these [steps](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction) to perform duration extraction.