w11wo's picture
Added Model
aa6d0a8
metadata
language: sw
license: cc-by-sa-4.0
tags:
  - audio
  - text-to-speech
inference: false
datasets:
  - bookbot/OpenBible_Swahili

VITS Base sw-KE-OpenBible

VITS Base sw-KE-OpenBible is an end-to-end text-to-speech model based on the VITS architecture. This model was trained from scratch on a real audio dataset. The list of real speakers include:

  • sw-KE-OpenBible

The model's vocabulary contains the different IPA phonemes found in gruut.

This model was trained using VITS framework. All training was done on a Scaleway L40S VM with a NVIDIA L40S GPU. All necessary scripts used for training could be found in the Files and versions tab, as well as the Training metrics logged via Tensorboard.

Model

Model SR (Hz) Mel range (Hz) FFT / Hop / Win #epochs
VITS Base sw-KE-OpenBible 44.1K 0-null 2048 / 512 / 2048 12000

Training procedure

Prepare Data

python preprocess.py \
    --text_index 1 \
    --filelists filelists/sw-KE-OpenBible_text_train_filelist.txt filelists/sw-KE-OpenBible_text_val_filelist.txt \
    --text_cleaners swahili_cleaners

Train

python train.py -c configs/sw_ke_openbible_base.json -m sw_ke_openbible_base

Frameworks

  • PyTorch 2.2.2
  • VITS