waveletdeboshir
commited on
Commit
•
3eee831
1
Parent(s):
d42bf8d
Update README.md
Browse files
README.md
CHANGED
@@ -118,11 +118,13 @@ base_model:
|
|
118 |
This is a version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) model without number tokens (token ids corresponding to numbers are excluded).
|
119 |
NO fine-tuning was used.
|
120 |
|
121 |
-
Phrases with spoken numbers will be transcribed with numbers as words.
|
122 |
|
123 |
**Example**: Instead of **"25"** this model will transcribe phrase as **"twenty five"**.
|
124 |
|
125 |
## Usage
|
|
|
|
|
126 |
Model can be used as an original whisper:
|
127 |
|
128 |
```python
|
@@ -131,12 +133,14 @@ Model can be used as an original whisper:
|
|
131 |
|
132 |
>>> # load audio
|
133 |
>>> wav, sr = torchaudio.load("audio.wav")
|
|
|
|
|
134 |
|
135 |
>>> # load model and processor
|
136 |
>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
|
137 |
>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
|
138 |
|
139 |
-
>>> input_features = processor(wav[0], sampling_rate=
|
140 |
|
141 |
>>> # generate token ids
|
142 |
>>> predicted_ids = model.generate(input_features)
|
|
|
118 |
This is a version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) model without number tokens (token ids corresponding to numbers are excluded).
|
119 |
NO fine-tuning was used.
|
120 |
|
121 |
+
Phrases with spoken numbers will be transcribed with numbers as words. It can be useful for TTS data preparation.
|
122 |
|
123 |
**Example**: Instead of **"25"** this model will transcribe phrase as **"twenty five"**.
|
124 |
|
125 |
## Usage
|
126 |
+
`transformers` version `4.45.2`
|
127 |
+
|
128 |
Model can be used as an original whisper:
|
129 |
|
130 |
```python
|
|
|
133 |
|
134 |
>>> # load audio
|
135 |
>>> wav, sr = torchaudio.load("audio.wav")
|
136 |
+
>>> # resample if necessary
|
137 |
+
>>> wav = torchaudio.functional.resample(wav, sr, 16000)
|
138 |
|
139 |
>>> # load model and processor
|
140 |
>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
|
141 |
>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
|
142 |
|
143 |
+
>>> input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt").input_features
|
144 |
|
145 |
>>> # generate token ids
|
146 |
>>> predicted_ids = model.generate(input_features)
|