How to fine fine MMS text to speech models?

by allandclive - opened Jun 23, 2023

Discussion

allandclive

Jun 23, 2023

Is there any way to work around fine tuning MMS-TTS models?

vineelpratap

Jun 23, 2023

@Matthijs and @sanchit-gandhi are working on this. We hope to have it soon.

cc. @bowenshi

kdcyberdude

Aug 8, 2023

Any updates on this??

allandclive

Aug 11, 2023

Any updates on this??

https://github.com/huggingface/transformers/pull/24085

sanchit-gandhi

Aug 14, 2023

It's ongoing! The model addition is the final review stages, then we can work on a fine-tuning script (cc @ylacombe )

arbianqx

Sep 18, 2023

any updates on this? @sanchit-gandhi ?

AyoK

Oct 1, 2023

Following

rileydrizzy

Oct 27, 2023

any updates on this please?

ylacombe

Oct 30, 2023

Hi there, I'm currently working on finetuning VITS and MMS, stay tuned!

arbianqx

Nov 24, 2023

@ylacombe any updates on this?

ylacombe

Nov 24, 2023

Hey @arbianqx , it's still a WIP.

If you are interested, here are the two ongoing PRs on which I'm working on: https://github.com/huggingface/transformers/pull/27340 https://github.com/huggingface/transformers/pull/27244
Note that as long as the PRs are not merged, I can't really give you support on this.

On another note, what languages are you interested in? Finetuning MMS is an interesting task, and I'm trying to understand which languages are the most interesting to work on!

arbianqx

Nov 24, 2023

Hey @ylacombe , thanks for quick reply.

Well, I'll be patiently wait on this.

Indeed it is. I'm planning to finetune this, for albanian language (The code for this on MMS was "sqi" if I'm not mistaken).

akashicmarga

Dec 11, 2023

Hi @ylacombe any update on this?

ylacombe

Dec 12, 2023

Hey, I haven't made any official announcements yet, but you can already find what you want in the following library: https://github.com/ylacombe/finetune-hf-vits

Don't hesitate to give feedback and share your finetuned models if you can!

rileydrizzy

Dec 18, 2023

Hey, Thank you very much @ylacombe and the team. Appreciate. 👍🏾

syedmuhammad

Jan 1

•

edited Jan 1

Hi @ylacombe , hope you're doing good. Can you please help me, I want to finetune a MMS-TTS (facebook/mms-tts-urd-script_arabic), it's for urdu language. I actually want it to finetune on a specific speaker audio. How can I create a speaker embedding for the speaker and finetune the model so it provide me the audio of that particular speaker. Also, please tell me how can I do it if I want multiple speaker in the same model. Your help would be appreciated. Happy New Year!!!

hadiqa123

May 5

@sanchit-gandhi Hi Brother, any update on the finetuning of MMS-TTS (facebook/mms-tts-urd-script-arabic)?

syedmuhammad

May 5

@sanchit-gandhi Hi Brother, any update on the finetuning of MMS-TTS (facebook/mms-tts-urd-script-arabic)?

Yeah, it's working great!!! thanks to @ylacombe

hadiqa123

May 5

@syedmuhammad Thank you for the response, can you please refer me the link. Thanks

syedmuhammad

May 5

@syedmuhammad Thank you for the response, can you please refer me the link. Thanks

kindly refer the repo: https://github.com/ylacombe/finetune-hf-vits

hadiqa123

May 5

•

edited May 5

@syedmuhammad Thanks, I will check this.
Have you your own training colab notebook for urdu language using the following model ?
facebook/mms-tts-urd-script_arabic

syedmuhammad

May 5

@syedmuhammad Thanks, I will check this.
Have you your own training colab notebook for urdu language using the following model ?
facebook/mms-tts-urd-script_arabic

Yes

hadiqa123

May 5

@syedmuhammad Would you like to share it.

syedmuhammad

May 5

@syedmuhammad Would you like to share it.

You can email me at: [email protected]

charbossly

May 9

during fieturning i got return tensor.to(device, non_blocking=non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: BatchEncoding.to() got an unexpected keyword argument 'non_blocking'

solution?

khof312

May 23

during fieturning i got return tensor.to(device, non_blocking=non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: BatchEncoding.to() got an unexpected keyword argument 'non_blocking'

solution?

@charbossly Maybe this solution will help: https://github.com/ylacombe/finetune-hf-vits/issues/22

charbossly

Jun 11

@khof312 Thanks for your solution .

yaambe

Jun 26

Hello,

I recently finetuned an MMS model using Hugging Face tools provided at https://github.com/ylacombe/finetune-hf-vits and have successfully obtained a VITS model in the model.safetensors format. While I found documentation on how to export the MMS model to Sherpa-ONNX (https://k2-fsa.github.io/sherpa/onnx/tts/mms.html), I couldn't find information on how to export this specific TTS model to Sherpa ONNX.

Could you please provide guidance or steps on how to achieve this?

andrewbawitlung

Jul 3

This comment has been hidden

yukiarimo

Sep 8

Any updates?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment