waveletdeboshir commited on
Commit
021f430
1 Parent(s): 822a567

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -3
README.md CHANGED
@@ -1,3 +1,146 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: automatic-speech-recognition
5
+ tags:
6
+ - asr
7
+ - Pytorch
8
+ - pruned
9
+ - audio
10
+ - automatic-speech-recognition
11
+ language:
12
+ - en
13
+ - zh
14
+ - de
15
+ - es
16
+ - ru
17
+ - ko
18
+ - fr
19
+ - ja
20
+ - pt
21
+ - tr
22
+ - pl
23
+ - ca
24
+ - nl
25
+ - ar
26
+ - sv
27
+ - it
28
+ - id
29
+ - hi
30
+ - fi
31
+ - vi
32
+ - he
33
+ - uk
34
+ - el
35
+ - ms
36
+ - cs
37
+ - ro
38
+ - da
39
+ - hu
40
+ - ta
41
+ - no
42
+ - th
43
+ - ur
44
+ - hr
45
+ - bg
46
+ - lt
47
+ - la
48
+ - mi
49
+ - ml
50
+ - cy
51
+ - sk
52
+ - te
53
+ - fa
54
+ - lv
55
+ - bn
56
+ - sr
57
+ - az
58
+ - sl
59
+ - kn
60
+ - et
61
+ - mk
62
+ - br
63
+ - eu
64
+ - is
65
+ - hy
66
+ - ne
67
+ - mn
68
+ - bs
69
+ - kk
70
+ - sq
71
+ - sw
72
+ - gl
73
+ - mr
74
+ - pa
75
+ - si
76
+ - km
77
+ - sn
78
+ - yo
79
+ - so
80
+ - af
81
+ - oc
82
+ - ka
83
+ - be
84
+ - tg
85
+ - sd
86
+ - gu
87
+ - am
88
+ - yi
89
+ - lo
90
+ - uz
91
+ - fo
92
+ - ht
93
+ - ps
94
+ - tk
95
+ - nn
96
+ - mt
97
+ - sa
98
+ - lb
99
+ - my
100
+ - bo
101
+ - tl
102
+ - mg
103
+ - as
104
+ - tt
105
+ - haw
106
+ - ln
107
+ - ha
108
+ - ba
109
+ - jw
110
+ - su
111
+ ---
112
+
113
+ # Whisper-large-v3-no-numbers
114
+
115
+ ## Model info
116
+ This is a version of [openai/whisper-small](https://huggingface.co/openai/whisper-large-v3) model without number tokens (token ids corresponding to numbers are excluded).
117
+ NO fine-tuning was used.
118
+
119
+ Phrases with spoken numbers will be transcribed with numbers as words.
120
+
121
+ Example: Instead of "25" this model will transcribe phrase as "twenty five".
122
+
123
+ ## Usage
124
+ Model can be used as an original whisper:
125
+
126
+ ```python
127
+ >>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
128
+ >>> import torchaudio
129
+
130
+ >>> # load audio
131
+ >>> wav, sr = torchaudio.load("audio.wav")
132
+
133
+ >>> # load model and processor
134
+ >>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
135
+ >>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers")
136
+
137
+ >>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features
138
+
139
+ >>> # generate token ids
140
+ >>> predicted_ids = model.generate(input_features)
141
+ >>> # decode token ids to text
142
+ >>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
143
+ ['<|startoftranscript|><|en|><|transcribe|><|notimestamps|> I'm twenty seven years old. <|endoftext|>']
144
+
145
+ ```
146
+ The context tokens can be removed from the start of the transcription by setting `skip_special_tokens=True`.