Commit
•
32650eb
1
Parent(s):
913e947
Add docs for beam search decoding (#3)
Browse files- Add docs for beam search decoding (d4b7f76a904ff3f208d7162a871a1a78251c0845)
Co-authored-by: Vineel Pratap <[email protected]>
README.md
CHANGED
@@ -260,6 +260,72 @@ In the same way the language can be switched out for all other supported languag
|
|
260 |
processor.tokenizer.vocab.keys()
|
261 |
```
|
262 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
263 |
For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
|
264 |
|
265 |
## Supported Languages
|
|
|
260 |
processor.tokenizer.vocab.keys()
|
261 |
```
|
262 |
|
263 |
+
*Beam Search Decoding with Language Model*
|
264 |
+
|
265 |
+
To run decoding with n-gram language model, download the decoding config which consits of the paths to language model, lexicon, token files and also the best decoding hyperparameters. Language models are only avaialble for the 102 languages of the FLEURS dataset.
|
266 |
+
|
267 |
+
```py
|
268 |
+
import json
|
269 |
+
|
270 |
+
lm_decoding_config = {}
|
271 |
+
lm_decoding_configfile = hf_hub_download(
|
272 |
+
repo_id="facebook/mms-cclms",
|
273 |
+
filename="decoding_config.json",
|
274 |
+
subfolder="mms-1b-all",
|
275 |
+
)
|
276 |
+
with open(lm_decoding_configfile) as f:
|
277 |
+
lm_decoding_config = json.loads(f.read())
|
278 |
+
```
|
279 |
+
|
280 |
+
Now, download all the files needed for decoding.
|
281 |
+
```py
|
282 |
+
# modify the ISO language code if using a different language.
|
283 |
+
decoding_config = lm_decoding_config["eng"]
|
284 |
+
|
285 |
+
lm_file = hf_hub_download(
|
286 |
+
repo_id="facebook/mms-cclms",
|
287 |
+
filename=decoding_config["lmfile"].rsplit("/", 1)[1],
|
288 |
+
subfolder=decoding_config["lmfile"].rsplit("/", 1)[0],
|
289 |
+
)
|
290 |
+
token_file = hf_hub_download(
|
291 |
+
repo_id="facebook/mms-cclms",
|
292 |
+
filename=decoding_config["tokensfile"].rsplit("/", 1)[1],
|
293 |
+
subfolder=decoding_config["tokensfile"].rsplit("/", 1)[0],
|
294 |
+
)
|
295 |
+
lexicon_file = None
|
296 |
+
if decoding_config["lexiconfile"] is not None:
|
297 |
+
lexicon_file = hf_hub_download(
|
298 |
+
repo_id="facebook/mms-cclms",
|
299 |
+
filename=decoding_config["lexiconfile"].rsplit("/", 1)[1],
|
300 |
+
subfolder=decoding_config["lexiconfile"].rsplit("/", 1)[0],
|
301 |
+
)
|
302 |
+
```
|
303 |
+
|
304 |
+
Create the `torchaudio.models.decoder.CTCDecoder` object
|
305 |
+
|
306 |
+
```py
|
307 |
+
from torchaudio.models.decoder import ctc_decoder
|
308 |
+
beam_search_decoder = ctc_decoder(
|
309 |
+
lexicon=lexicon_file,
|
310 |
+
tokens=token_file,
|
311 |
+
lm=lm_file,
|
312 |
+
nbest=1,
|
313 |
+
beam_size=500,
|
314 |
+
beam_size_token=50,
|
315 |
+
lm_weight=float(decoding_config["lmweight"]),
|
316 |
+
word_score=float(decoding_config["wordscore"]),
|
317 |
+
sil_score=float(decoding_config["silweight"]),
|
318 |
+
blank_token="<s>",
|
319 |
+
)
|
320 |
+
|
321 |
+
```
|
322 |
+
|
323 |
+
Passing the model output to the ctc decoder will return the transcription.
|
324 |
+
|
325 |
+
```py
|
326 |
+
beam_search_result = beam_search_decoder(outputs.to("cpu"))
|
327 |
+
transcription = " ".join(beam_search_result[0][0].words).strip()
|
328 |
+
```
|
329 |
For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
|
330 |
|
331 |
## Supported Languages
|