Youtube: smotto_ai Tiktok: smotto_ai
RVC v2 model
Don't forget to credit me @Smotto if you do use this model!
Data
- English Talking style taken from her live podcast. Cut and clean them manually, but didn't bother keeping clips with any background noise (too lazy to edit them out between words).
- 61.6 MB of data == 5 min and 36 seconds
- 48khz 16bit-depth audio files
Processing
- Split audio clips using whisperX
- Kim vocal 1 -> Reverb HQ -> Karaoke 2 (if * needed) -> DeEcho -> Denoise
Hyper-parameters
- mangio-crepe
- 6 batch size
- 16 pitch extraction hop-length
- 300 epochs