Bangla TTS
The Bangla TTS was traning mono(male) speaker using Vit tts model. The paper is ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer we used the coqui-ai🐸-toolkit for Bangla Text-to-Speech training as well as inference.
Contributions
Collect various Bangla datasets from the internet some data are collected from Mozilla common voice dataset and train the model.
we’ve developed the Bangla Vits TTS(text to speech) system that we trained and used for reading various Bangla
text with the highest performing State of the Art(SOTA) Bangla neural voice.
Dataset
The Bangla Text-to-Speech (TTS) Team at IIT Madras has curated a Bangla Speech corpus, which has been meticulously processed for research purposes. The dataset has been downsampled to 22050 and reformatted from the original IITM annotation style to the LJSpeech format. This refined dataset, tailored for Bangla TTS, is accompanied by the weight files of the best-trained models. Researchers are encouraged to cite the corresponding paper, available at Paper Link, when utilizing this dataset in their research endeavors. The provided dataset and model weights contribute to the advancement of Bangla TTS research and serve as a valuable resource for further investigations in the field. Dataset Link
Evaluation
Mean Opinion Score(MOS) : 4.10 MOS Calculation method
Inference
For testing please check the end point integration Github
References :
- Downloads last month
- 44