huseinzol05 commited on
Commit
837051f
1 Parent(s): acd4fcd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ms
4
+ - en
5
+ ---
6
+
7
+ # Malaysian Finetune Whisper Base
8
+
9
+ Finetune Whisper Base on Malaysian dataset,
10
+ 1. IMDA STT, https://huggingface.co/datasets/mesolitica/IMDA-STT
11
+ 2. Pseudolabel Malaysian youtube videos, https://huggingface.co/datasets/mesolitica/pseudolabel-malaysian-youtube-whisper-large-v3
12
+ 3. Malay Conversational Speech Corpus, https://huggingface.co/datasets/malaysia-ai/malay-conversational-speech-corpus
13
+ 4. Haqkiem TTS Dataset, this is private, but you request access from https://www.linkedin.com/in/haqkiem-daim/
14
+ 5. Pseudolabel Nusantara audiobooks, https://huggingface.co/datasets/mesolitica/nusantara-audiobook
15
+
16
+ Script at https://github.com/mesolitica/malaya-speech/tree/malaysian-speech/session/whisper
17
+
18
+ Wandb at https://wandb.ai/huseinzol05/malaysian-whisper-base?workspace=user-huseinzol05