wtlow003 commited on
Commit
eb7449f
1 Parent(s): ce81198

fix: README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -5
README.md CHANGED
@@ -40,6 +40,13 @@ The recommended audio usage for testing should be:
40
 
41
  To use the model in an application, you can make use of `transformers`:
42
 
 
 
 
 
 
 
 
43
  ### Out-of-Scope Use
44
 
45
  - Long form audio
@@ -47,16 +54,20 @@ To use the model in an application, you can make use of `transformers`:
47
  - Poor quality audio (audio samples are recorded in a controlled environment)
48
  - Conversation (as the model is not trained on conversation)
49
 
50
-
51
- ## How to Get Started with the Model
52
-
53
-
54
  ## Training Details
55
 
56
  ### Training Data
57
 
 
 
 
 
 
58
  ### Training Procedure
59
 
 
 
 
60
  #### Training Hyperparameters
61
 
62
  The following hyperparameters are used:
@@ -85,10 +96,14 @@ The following hyperparameters are used:
85
  | 3500 | 4.581152 | 0.0484 | 0.1741 | 8.145801 |
86
  | 4000 | 5.235602 | 0.0401 | 0.1773 | 8.138047 |
87
 
 
 
88
  ### Testing Data, Factors & Metrics
89
 
90
  #### Testing Data
91
 
 
 
92
  ### Results
93
 
94
  | Model | WER |
@@ -102,7 +117,6 @@ The following hyperparameters are used:
102
 
103
  ### Model Architecture and Objective
104
 
105
-
106
  ### Compute Infrastructure
107
 
108
  [More Information Needed]
 
40
 
41
  To use the model in an application, you can make use of `transformers`:
42
 
43
+ ```python
44
+ # Use a pipeline as a high-level helper
45
+ from transformers import pipeline
46
+
47
+ pipe = pipeline("automatic-speech-recognition", model="jensenlwt/whisper-small-singlish-122k")
48
+ ```
49
+
50
  ### Out-of-Scope Use
51
 
52
  - Long form audio
 
54
  - Poor quality audio (audio samples are recorded in a controlled environment)
55
  - Conversation (as the model is not trained on conversation)
56
 
 
 
 
 
57
  ## Training Details
58
 
59
  ### Training Data
60
 
61
+ We made use of the [National Speech Corpus](https://www.imda.gov.sg/how-we-can-help/national-speech-corpus) for training.
62
+ In specific, we made use of **Part 2** – which is a series of audio samples of prompted read speech recordings that involves local named entities, slang, and dialect.
63
+
64
+ To train, I make used of the first 300 transcripts in the corpus, which is around 122k samples from ~161 speakers.
65
+
66
  ### Training Procedure
67
 
68
+ The model is fine-tuned with occasional interruptions to adjust batch size to maximise GPU utilisation.
69
+ In addition, I also end training early if eval_loss does not decrease in two evaluation steps as per previous training experience.
70
+
71
  #### Training Hyperparameters
72
 
73
  The following hyperparameters are used:
 
96
  | 3500 | 4.581152 | 0.0484 | 0.1741 | 8.145801 |
97
  | 4000 | 5.235602 | 0.0401 | 0.1773 | 8.138047 |
98
 
99
+ The model with the lowest evaluation loss is used as the final checkpoint.
100
+
101
  ### Testing Data, Factors & Metrics
102
 
103
  #### Testing Data
104
 
105
+ To test the model, I made use of the last 100 transcripts (held-out test set) in the corpus, which is around 43k samples.
106
+
107
  ### Results
108
 
109
  | Model | WER |
 
117
 
118
  ### Model Architecture and Objective
119
 
 
120
  ### Compute Infrastructure
121
 
122
  [More Information Needed]