Model Dataset
#1
by
Sadique5
- opened
I'd like to get the dataset used to train this model. It performs really bad so I'm also thinking of fine tuning it on my own dataset
Thanks. Current model while reading ignores several letters
Hey
@Sadique5
! The model was trained on normalised text (i.e. excluding casing and punctuation), hence why these are not respected by the model when you perform inference. You can try using the hyphen character -
to add pauses in the speech, since it is present in the vocabulary of the Arabic MMS TTS checkpoint and serves mostly this purpose. Otherwise, ensure all your text is lowercased and in the vocabulary of the model: https://huggingface.co/facebook/mms-tts-ara/blob/main/vocab.json