waveletdeboshir commited on
Commit
3eee831
1 Parent(s): d42bf8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -118,11 +118,13 @@ base_model:
118
  This is a version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) model without number tokens (token ids corresponding to numbers are excluded).
119
  NO fine-tuning was used.
120
 
121
- Phrases with spoken numbers will be transcribed with numbers as words.
122
 
123
  **Example**: Instead of **"25"** this model will transcribe phrase as **"twenty five"**.
124
 
125
  ## Usage
 
 
126
  Model can be used as an original whisper:
127
 
128
  ```python
@@ -131,12 +133,14 @@ Model can be used as an original whisper:
131
 
132
  >>> # load audio
133
  >>> wav, sr = torchaudio.load("audio.wav")
 
 
134
 
135
  >>> # load model and processor
136
  >>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
137
  >>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
138
 
139
- >>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features
140
 
141
  >>> # generate token ids
142
  >>> predicted_ids = model.generate(input_features)
 
118
  This is a version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) model without number tokens (token ids corresponding to numbers are excluded).
119
  NO fine-tuning was used.
120
 
121
+ Phrases with spoken numbers will be transcribed with numbers as words. It can be useful for TTS data preparation.
122
 
123
  **Example**: Instead of **"25"** this model will transcribe phrase as **"twenty five"**.
124
 
125
  ## Usage
126
+ `transformers` version `4.45.2`
127
+
128
  Model can be used as an original whisper:
129
 
130
  ```python
 
133
 
134
  >>> # load audio
135
  >>> wav, sr = torchaudio.load("audio.wav")
136
+ >>> # resample if necessary
137
+ >>> wav = torchaudio.functional.resample(wav, sr, 16000)
138
 
139
  >>> # load model and processor
140
  >>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
141
  >>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
142
 
143
+ >>> input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt").input_features
144
 
145
  >>> # generate token ids
146
  >>> predicted_ids = model.generate(input_features)