Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,8 @@ tags:
|
|
6 |
- whisper-event
|
7 |
- generated_from_trainer
|
8 |
datasets:
|
9 |
-
- mozilla-foundation/
|
|
|
10 |
metrics:
|
11 |
- wer
|
12 |
model-index:
|
@@ -25,6 +26,7 @@ model-index:
|
|
25 |
- name: Wer
|
26 |
type: wer
|
27 |
value: 8.44
|
|
|
28 |
---
|
29 |
|
30 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
@@ -32,8 +34,8 @@ should probably proofread and complete it, then remove this comment. -->
|
|
32 |
|
33 |
# Whisper Medium (Thai): Combined V2
|
34 |
|
35 |
-
This model is a fine-tuned
|
36 |
-
It achieves the following results on the evaluation set:
|
37 |
- Loss: 0.1475
|
38 |
- WER: 13.03 (without Tokenizer)
|
39 |
- WER: 8.44 (with Deepcut Tokenizer)
|
@@ -45,7 +47,7 @@ Use the model with huggingface's `transformers` as follows:
|
|
45 |
```py
|
46 |
from transformers import pipeline
|
47 |
|
48 |
-
MODEL_NAME = "biodatlab/whisper-medium-th-combined
|
49 |
lang = "th" # change to Thai langauge
|
50 |
|
51 |
device = 0 if torch.cuda.is_available() else "cpu"
|
@@ -96,10 +98,10 @@ The following hyperparameters were used during training:
|
|
96 |
|
97 |
### Framework versions
|
98 |
|
99 |
-
- Transformers 4.
|
100 |
-
- Pytorch 1.
|
101 |
-
- Datasets 2.
|
102 |
-
- Tokenizers 0.13.
|
103 |
|
104 |
## Citation
|
105 |
|
|
|
6 |
- whisper-event
|
7 |
- generated_from_trainer
|
8 |
datasets:
|
9 |
+
- mozilla-foundation/common_voice_13_0
|
10 |
+
- google/fleurs
|
11 |
metrics:
|
12 |
- wer
|
13 |
model-index:
|
|
|
26 |
- name: Wer
|
27 |
type: wer
|
28 |
value: 8.44
|
29 |
+
library_name: transformers
|
30 |
---
|
31 |
|
32 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
|
|
34 |
|
35 |
# Whisper Medium (Thai): Combined V2
|
36 |
|
37 |
+
This model is a fine-tuned, augmented versions of [biodatlab/whisper-medium-th-1000iter](https://huggingface.co/biodatlab/whisper-medium-th-1000iter) on the mozilla-foundation/common_voice_13_0 th, google/fleurs, and curated datasets.
|
38 |
+
It achieves the following results (NOT-UP-TO-DATE) on the common-voice-11 evaluation set:
|
39 |
- Loss: 0.1475
|
40 |
- WER: 13.03 (without Tokenizer)
|
41 |
- WER: 8.44 (with Deepcut Tokenizer)
|
|
|
47 |
```py
|
48 |
from transformers import pipeline
|
49 |
|
50 |
+
MODEL_NAME = "biodatlab/whisper-medium-th-combined" # specify the model name
|
51 |
lang = "th" # change to Thai langauge
|
52 |
|
53 |
device = 0 if torch.cuda.is_available() else "cpu"
|
|
|
98 |
|
99 |
### Framework versions
|
100 |
|
101 |
+
- Transformers 4.31.0.dev0
|
102 |
+
- Pytorch 2.1.0
|
103 |
+
- Datasets 2.13.1
|
104 |
+
- Tokenizers 0.13.3
|
105 |
|
106 |
## Citation
|
107 |
|