Text-to-Speech (TTS) with VITS trained on Kiswahili and Luganda Common Voice
This repository provides all the necessary tools for Text-to-Speech (TTS) with Coqui TTS using a VITS fine-tuned on Kiswahili and Luganda Common Voice v13 from six speakers of a similar intonation.
The pre-trained model takes in as input a text and produces a waveform/audio in output.
How to Synthesize Speech using our models
First, you need to install TTS
pip install TTS
Perform Text-to-Speech (TTS)
from TTS.utils.synthesizer import Synthesizer
synthesizer = Synthesizer(
"<model checkpoint path>",
"<model configuration file>",
None,
None,
None,
None,
None,
None,
None,
)
sentence_to_synthesize = "Your Kiswahili or Luganda sentence here"
if sentence_to_synthesize:
print(sentence_to_synthesize)
wav = synthesizer.tts(sentence_to_synthesize, None, None, None)
location = "output.wav" # Choose a desired name for the output file
synthesizer.save_wav(wav, location)
Limitations
We do not provide any warranty on the performance achieved by this model when used on other datasets.
Citing
Please, cite our work if you use our models for your research or business.
@inproceedings{buildingTTS,
title={Building a Luganda Text-to-Speech Model from Crowdsourced Data},
author={Kagumire, Sulaiman and Katumba, Andrew and Nakatumba-Nabende, Joyce and Quinn, John},
booktitle={5th Workshop on African Natural Language Processing},
year ={2024}
}