metadata
language: sw
license: cc-by-sa-4.0
tags:
- audio
- text-to-speech
inference: false
datasets:
- bookbot/OpenBible_Swahili
VITS Base sw-KE-OpenBible
VITS Base sw-KE-OpenBible is an end-to-end text-to-speech model based on the VITS architecture. This model was trained from scratch on a real audio dataset. The list of real speakers include:
- sw-KE-OpenBible
The model's vocabulary contains the different IPA phonemes found in gruut.
This model was trained using VITS framework. All training was done on a Scaleway L40S VM with a NVIDIA L40S GPU. All necessary scripts used for training could be found in the Files and versions tab, as well as the Training metrics logged via Tensorboard.
Model
Model | SR (Hz) | Mel range (Hz) | FFT / Hop / Win | #epochs |
---|---|---|---|---|
VITS Base sw-KE-OpenBible | 44.1K | 0-null | 2048 / 512 / 2048 | 12000 |
Training procedure
Prepare Data
python preprocess.py \
--text_index 1 \
--filelists filelists/sw-KE-OpenBible_text_train_filelist.txt filelists/sw-KE-OpenBible_text_val_filelist.txt \
--text_cleaners swahili_cleaners
Train
python train.py -c configs/sw_ke_openbible_base.json -m sw_ke_openbible_base
Frameworks
- PyTorch 2.2.2
- VITS