|
--- |
|
language: |
|
- ms |
|
- en |
|
--- |
|
|
|
# Malaysian Finetune Whisper Base |
|
|
|
Finetune Whisper Base on Malaysian dataset, |
|
1. IMDA STT, https://huggingface.co/datasets/mesolitica/IMDA-STT |
|
2. Pseudolabel Malaysian youtube videos, https://huggingface.co/datasets/mesolitica/pseudolabel-malaysian-youtube-whisper-large-v3 |
|
3. Malay Conversational Speech Corpus, https://huggingface.co/datasets/malaysia-ai/malay-conversational-speech-corpus |
|
4. Haqkiem TTS Dataset, this is private, but you can request access from https://www.linkedin.com/in/haqkiem-daim/ |
|
5. Pseudolabel Nusantara audiobooks, https://huggingface.co/datasets/mesolitica/nusantara-audiobook |
|
|
|
Script at https://github.com/mesolitica/malaya-speech/tree/malaysian-speech/session/whisper |
|
|
|
Wandb at https://wandb.ai/huseinzol05/malaysian-whisper-base?workspace=user-huseinzol05 |