waveletdeboshir
/

whisper-large-v3-no-numbers

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

waveletdeboshir commited on 21 days ago

Commit

3eee831

•

1 Parent(s): d42bf8d

Update README.md

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -118,11 +118,13 @@ base_model:
 This is a version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) model without number tokens (token ids corresponding to numbers are excluded).
 NO fine-tuning was used.
-Phrases with spoken numbers will be transcribed with numbers as words.
 **Example**: Instead of **"25"** this model will transcribe phrase as **"twenty five"**.
 ## Usage
 Model can be used as an original whisper:
 ```python
@@ -131,12 +133,14 @@ Model can be used as an original whisper:
 >>> # load audio
 >>> wav, sr = torchaudio.load("audio.wav")
 >>> # load model and processor
 >>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
 >>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
->>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features
 >>> # generate token ids
 >>> predicted_ids = model.generate(input_features)

 This is a version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) model without number tokens (token ids corresponding to numbers are excluded).
 NO fine-tuning was used.
+Phrases with spoken numbers will be transcribed with numbers as words. It can be useful for TTS data preparation.
 **Example**: Instead of **"25"** this model will transcribe phrase as **"twenty five"**.
 ## Usage
+`transformers` version `4.45.2`
 Model can be used as an original whisper:
 ```python
 >>> # load audio
 >>> wav, sr = torchaudio.load("audio.wav")
+>>> # resample if necessary
+>>> wav = torchaudio.functional.resample(wav, sr, 16000)
 >>> # load model and processor
 >>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
 >>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
+>>> input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt").input_features
 >>> # generate token ids
 >>> predicted_ids = model.generate(input_features)