File size: 1,405 Bytes
3b55aba 1c72d84 3b55aba 06ebb61 bdecc0a 140079b e4a1922 e9f8ce4 e4a1922 06ebb61 e9f8ce4 06ebb61 70625b7 06ebb61 70625b7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
---
license: apache-2.0
language:
- fr
pipeline_tag: text-to-speech
tags:
- TTS
- text-to-speech
---
**V2.5 Model :**
Fine tune of my V2 model on all CommonVoice dataset (517k sample) on 2.5k step (batch size 200), Voice cloning has improved a bit but is still not great. However, if you fine tune this model on your own personality dataset then you can get pretty good results. A good V3 model would be to fine tune for like 50k steps on this dataset and I think there would be a way to get good results but I won't try
**V2 Model :**
Tortoise base model Fine tuned on a custom multispeaker French dataset of 120k samples (SIWIS + Common Voice subset + M-AILABS) on 10k step with a RTX 3090 (~= 21 hours of training), with Text LR Weight at 1
Result : The model can speak French much better without an English accent but the voice clone hardly works
**V1 Model :**
Tortoise base model Fine tuned on a custom multispeaker French dataset of 24k samples (SIWIS + Common Voice subset) on 8850 step with a RTX 3090 (~= 19 hours of training)
**Inference :**
* You can use the model by downloading the "V2_9750_gpt.pth" model and use it in the tortoise-tts optimized forks (git.ecker.tech/mrq/ai-voice-cloning | 152334H/tortoise-tts-fast)
**Fine tuning :**
* I used 152334H/DL-Art-School for training, if you want to resume training from my epoch, follow its documentation and download "V2_9750.state" |