Add docs for beam search decoding (#3)

Browse files

- Add docs for beam search decoding (d4b7f76a904ff3f208d7162a871a1a78251c0845)

Co-authored-by: Vineel Pratap <[email protected]>

Files changed (1) hide show

README.md +66 -0

README.md CHANGED Viewed

@@ -260,6 +260,72 @@ In the same way the language can be switched out for all other supported languag
 processor.tokenizer.vocab.keys()
 ```
 For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
 ## Supported Languages

 processor.tokenizer.vocab.keys()
 ```
+*Beam Search Decoding with Language Model*
+To run decoding with n-gram language model, download the decoding config which consits of the paths to language model, lexicon, token files and also the best decoding hyperparameters. Language models are only avaialble for the 102 languages of the FLEURS dataset.
+```py
+import json
+lm_decoding_config = {}
+lm_decoding_configfile = hf_hub_download(
+    repo_id="facebook/mms-cclms",
+    filename="decoding_config.json",
+    subfolder="mms-1b-all",
+)
+with open(lm_decoding_configfile) as f:
+    lm_decoding_config = json.loads(f.read())
+```
+Now, download all the files needed for decoding.
+```py
+# modify the ISO language code if using a different language.
+decoding_config = lm_decoding_config["eng"]
+lm_file = hf_hub_download(
+    repo_id="facebook/mms-cclms",
+    filename=decoding_config["lmfile"].rsplit("/", 1)[1],
+    subfolder=decoding_config["lmfile"].rsplit("/", 1)[0],
+)
+token_file = hf_hub_download(
+    repo_id="facebook/mms-cclms",
+    filename=decoding_config["tokensfile"].rsplit("/", 1)[1],
+    subfolder=decoding_config["tokensfile"].rsplit("/", 1)[0],
+)
+lexicon_file = None
+if decoding_config["lexiconfile"] is not None:
+    lexicon_file = hf_hub_download(
+        repo_id="facebook/mms-cclms",
+        filename=decoding_config["lexiconfile"].rsplit("/", 1)[1],
+        subfolder=decoding_config["lexiconfile"].rsplit("/", 1)[0],
+    )
+```
+Create the `torchaudio.models.decoder.CTCDecoder` object
+```py
+from torchaudio.models.decoder import ctc_decoder
+beam_search_decoder = ctc_decoder(
+  lexicon=lexicon_file,
+  tokens=token_file,
+  lm=lm_file,
+  nbest=1,
+  beam_size=500,
+  beam_size_token=50,
+  lm_weight=float(decoding_config["lmweight"]),
+  word_score=float(decoding_config["wordscore"]),
+  sil_score=float(decoding_config["silweight"]),
+  blank_token="<s>",
+)
+```
+Passing the model output to the ctc decoder will return the transcription.
+```py
+beam_search_result = beam_search_decoder(outputs.to("cpu"))
+transcription = " ".join(beam_search_result[0][0].words).strip()
+```
 For more details, please have a look at [the official docs](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
 ## Supported Languages