Model Dataset

by Sadique5 - opened Sep 11, 2023

Sep 11, 2023

I'd like to get the dataset used to train this model. It performs really bad so I'm also thinking of fine tuning it on my own dataset

lysandre

Sep 11, 2023

Hello @Sadique5 , section 3 of the paper is all about the dataset creation.

Sadique5

Sep 12, 2023

Thanks. Current model while reading ignores several letters

sanchit-gandhi

Sep 28, 2023

Hey @Sadique5 ! The model was trained on normalised text (i.e. excluding casing and punctuation), hence why these are not respected by the model when you perform inference. You can try using the hyphen character - to add pauses in the speech, since it is present in the vocabulary of the Arabic MMS TTS checkpoint and serves mostly this purpose. Otherwise, ensure all your text is lowercased and in the vocabulary of the model: https://huggingface.co/facebook/mms-tts-ara/blob/main/vocab.json

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment