wtlow003
commited on
Commit
•
eb7449f
1
Parent(s):
ce81198
fix: README.md
Browse files
README.md
CHANGED
@@ -40,6 +40,13 @@ The recommended audio usage for testing should be:
|
|
40 |
|
41 |
To use the model in an application, you can make use of `transformers`:
|
42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
### Out-of-Scope Use
|
44 |
|
45 |
- Long form audio
|
@@ -47,16 +54,20 @@ To use the model in an application, you can make use of `transformers`:
|
|
47 |
- Poor quality audio (audio samples are recorded in a controlled environment)
|
48 |
- Conversation (as the model is not trained on conversation)
|
49 |
|
50 |
-
|
51 |
-
## How to Get Started with the Model
|
52 |
-
|
53 |
-
|
54 |
## Training Details
|
55 |
|
56 |
### Training Data
|
57 |
|
|
|
|
|
|
|
|
|
|
|
58 |
### Training Procedure
|
59 |
|
|
|
|
|
|
|
60 |
#### Training Hyperparameters
|
61 |
|
62 |
The following hyperparameters are used:
|
@@ -85,10 +96,14 @@ The following hyperparameters are used:
|
|
85 |
| 3500 | 4.581152 | 0.0484 | 0.1741 | 8.145801 |
|
86 |
| 4000 | 5.235602 | 0.0401 | 0.1773 | 8.138047 |
|
87 |
|
|
|
|
|
88 |
### Testing Data, Factors & Metrics
|
89 |
|
90 |
#### Testing Data
|
91 |
|
|
|
|
|
92 |
### Results
|
93 |
|
94 |
| Model | WER |
|
@@ -102,7 +117,6 @@ The following hyperparameters are used:
|
|
102 |
|
103 |
### Model Architecture and Objective
|
104 |
|
105 |
-
|
106 |
### Compute Infrastructure
|
107 |
|
108 |
[More Information Needed]
|
|
|
40 |
|
41 |
To use the model in an application, you can make use of `transformers`:
|
42 |
|
43 |
+
```python
|
44 |
+
# Use a pipeline as a high-level helper
|
45 |
+
from transformers import pipeline
|
46 |
+
|
47 |
+
pipe = pipeline("automatic-speech-recognition", model="jensenlwt/whisper-small-singlish-122k")
|
48 |
+
```
|
49 |
+
|
50 |
### Out-of-Scope Use
|
51 |
|
52 |
- Long form audio
|
|
|
54 |
- Poor quality audio (audio samples are recorded in a controlled environment)
|
55 |
- Conversation (as the model is not trained on conversation)
|
56 |
|
|
|
|
|
|
|
|
|
57 |
## Training Details
|
58 |
|
59 |
### Training Data
|
60 |
|
61 |
+
We made use of the [National Speech Corpus](https://www.imda.gov.sg/how-we-can-help/national-speech-corpus) for training.
|
62 |
+
In specific, we made use of **Part 2** – which is a series of audio samples of prompted read speech recordings that involves local named entities, slang, and dialect.
|
63 |
+
|
64 |
+
To train, I make used of the first 300 transcripts in the corpus, which is around 122k samples from ~161 speakers.
|
65 |
+
|
66 |
### Training Procedure
|
67 |
|
68 |
+
The model is fine-tuned with occasional interruptions to adjust batch size to maximise GPU utilisation.
|
69 |
+
In addition, I also end training early if eval_loss does not decrease in two evaluation steps as per previous training experience.
|
70 |
+
|
71 |
#### Training Hyperparameters
|
72 |
|
73 |
The following hyperparameters are used:
|
|
|
96 |
| 3500 | 4.581152 | 0.0484 | 0.1741 | 8.145801 |
|
97 |
| 4000 | 5.235602 | 0.0401 | 0.1773 | 8.138047 |
|
98 |
|
99 |
+
The model with the lowest evaluation loss is used as the final checkpoint.
|
100 |
+
|
101 |
### Testing Data, Factors & Metrics
|
102 |
|
103 |
#### Testing Data
|
104 |
|
105 |
+
To test the model, I made use of the last 100 transcripts (held-out test set) in the corpus, which is around 43k samples.
|
106 |
+
|
107 |
### Results
|
108 |
|
109 |
| Model | WER |
|
|
|
117 |
|
118 |
### Model Architecture and Objective
|
119 |
|
|
|
120 |
### Compute Infrastructure
|
121 |
|
122 |
[More Information Needed]
|