Updated README
Browse files
README.md
CHANGED
@@ -9,9 +9,9 @@ tags:
|
|
9 |
inference: false
|
10 |
datasets:
|
11 |
- bookbot/sw-TZ-Victoria
|
12 |
-
- bookbot/sw-TZ-Victoria-syllables
|
13 |
- bookbot/sw-TZ-Victoria-v2
|
14 |
-
- bookbot/sw-TZ-VictoriaNeural
|
15 |
---
|
16 |
|
17 |
# LightSpeech MFA SW v4
|
@@ -19,9 +19,9 @@ datasets:
|
|
19 |
LightSpeech MFA SW v4 is a text-to-mel-spectrogram model based on the [LightSpeech](https://arxiv.org/abs/2102.04040) architecture. This model was fine-tuned from [LightSpeech MFA SW v1](https://huggingface.co/bookbot/lightspeech-mfa-sw-v1) and trained on real and synthetic audio datasets. The list of speakers include:
|
20 |
|
21 |
- sw-TZ-Victoria
|
22 |
-
- sw-TZ-Victoria-syllables
|
23 |
- sw-TZ-Victoria-v2
|
24 |
-
- sw-TZ-VictoriaNeural
|
25 |
|
26 |
We trained an acoustic Swahili model on our speech corpus using [Montreal Forced Aligner v3.0.0](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) and used it as the duration extractor. That model, and consequently our model, uses the IPA phone set for Swahili. We used [gruut](https://github.com/rhasspy/gruut) for phonemization purposes. We followed these [steps](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction) to perform duration extraction.
|
27 |
|
|
|
9 |
inference: false
|
10 |
datasets:
|
11 |
- bookbot/sw-TZ-Victoria
|
12 |
+
- bookbot/sw-TZ-Victoria-syllables-word
|
13 |
- bookbot/sw-TZ-Victoria-v2
|
14 |
+
- bookbot/sw-TZ-VictoriaNeural-upsampled-48kHz
|
15 |
---
|
16 |
|
17 |
# LightSpeech MFA SW v4
|
|
|
19 |
LightSpeech MFA SW v4 is a text-to-mel-spectrogram model based on the [LightSpeech](https://arxiv.org/abs/2102.04040) architecture. This model was fine-tuned from [LightSpeech MFA SW v1](https://huggingface.co/bookbot/lightspeech-mfa-sw-v1) and trained on real and synthetic audio datasets. The list of speakers include:
|
20 |
|
21 |
- sw-TZ-Victoria
|
22 |
+
- sw-TZ-Victoria-syllables-word
|
23 |
- sw-TZ-Victoria-v2
|
24 |
+
- sw-TZ-VictoriaNeural-upsampled-48kHz
|
25 |
|
26 |
We trained an acoustic Swahili model on our speech corpus using [Montreal Forced Aligner v3.0.0](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) and used it as the duration extractor. That model, and consequently our model, uses the IPA phone set for Swahili. We used [gruut](https://github.com/rhasspy/gruut) for phonemization purposes. We followed these [steps](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction) to perform duration extraction.
|
27 |
|