waveletdeboshir
/

whisper-large-v3-no-numbers

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

waveletdeboshir commited on 26 days ago

Commit

021f430

•

1 Parent(s): 822a567

Update README.md

Files changed (1) hide show

README.md +146 -3

README.md CHANGED Viewed

@@ -1,3 +1,146 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: automatic-speech-recognition
+tags:
+- asr
+- Pytorch
+- pruned
+- audio
+- automatic-speech-recognition
+language:
+- en
+- zh
+- de
+- es
+- ru
+- ko
+- fr
+- ja
+- pt
+- tr
+- pl
+- ca
+- nl
+- ar
+- sv
+- it
+- id
+- hi
+- fi
+- vi
+- he
+- uk
+- el
+- ms
+- cs
+- ro
+- da
+- hu
+- ta
+- no
+- th
+- ur
+- hr
+- bg
+- lt
+- la
+- mi
+- ml
+- cy
+- sk
+- te
+- fa
+- lv
+- bn
+- sr
+- az
+- sl
+- kn
+- et
+- mk
+- br
+- eu
+- is
+- hy
+- ne
+- mn
+- bs
+- kk
+- sq
+- sw
+- gl
+- mr
+- pa
+- si
+- km
+- sn
+- yo
+- so
+- af
+- oc
+- ka
+- be
+- tg
+- sd
+- gu
+- am
+- yi
+- lo
+- uz
+- fo
+- ht
+- ps
+- tk
+- nn
+- mt
+- sa
+- lb
+- my
+- bo
+- tl
+- mg
+- as
+- tt
+- haw
+- ln
+- ha
+- ba
+- jw
+- su
+---
+# Whisper-large-v3-no-numbers
+## Model info
+This is a version of [openai/whisper-small](https://huggingface.co/openai/whisper-large-v3) model without number tokens (token ids corresponding to numbers are excluded).
+NO fine-tuning was used.
+Phrases with spoken numbers will be transcribed with numbers as words.
+Example: Instead of "25" this model will transcribe phrase as "twenty five".
+## Usage
+Model can be used as an original whisper:
+```python
+>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
+>>> import torchaudio
+>>> # load audio
+>>> wav, sr = torchaudio.load("audio.wav")
+>>> # load model and processor
+>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
+>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
+>>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features
+>>> # generate token ids
+>>> predicted_ids = model.generate(input_features)
+>>> # decode token ids to text
+>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
+['<|startoftranscript|><|en|><|transcribe|><|notimestamps|> I'm twenty seven years old. <|endoftext|>']
+```
+The context tokens can be removed from the start of the transcription by setting `skip_special_tokens=True`.