nvidia
/

stt_en_fastconformer_hybrid_large_streaming_multi

Automatic Speech Recognition

Model card Files Files and versions Community

vnoroozi commited on Oct 13, 2023

Commit

b0f57de

•

1 Parent(s): 903f87e

Update README.md

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -65,7 +65,8 @@ img {
 | [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
 This collection contains large-size versions of cache-aware FastConformer-Hybrid (around 114M parameters) with multiple look-ahead support, trained on a large scale english speech.
-These models are trained for streaming ASR, which be used for streaming applications with a variety of latencies (0ms, 80ms, 480s, 1040ms).
 ## Model Architecture
@@ -108,7 +109,8 @@ Note: older versions of the model may have trained on smaller set of datasets.
 ## Performance
-The list of the available models in this collection is shown in the following tables for both CTC and Transducer decoders. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
 ### Transducer Decoder
@@ -152,8 +154,9 @@ Then simply do:
 import nemo.collections.asr as nemo_asr
 asr_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(model_name="nvidia/stt_en_fastconformer_hybrid_large_streaming_multi")
-#Optional: change the default latency. Default latency is 1040ms. Supported latencies: {0: 0ms, 1: 80ms, 16: 480ms, 33: 1040ms}.
-asr_model.encoder.set_default_att_context_size(33)
 #Optional: change the default decoder. Default decoder is Transducer (RNNT). Supported decoders: {ctc, rnnt}.
 asr_model.change_decoding_strategy(decoder_type='rnnt')

 | [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
 This collection contains large-size versions of cache-aware FastConformer-Hybrid (around 114M parameters) with multiple look-ahead support, trained on a large scale english speech.
+These models are trained for streaming ASR, which be used for streaming applications with a variety of latencies (0ms, 80ms, 480s, 1040ms).
+These are the worst latency and average latency of the model for each case would be half of these numbers.
 ## Model Architecture
 ## Performance
+The list of the available models in this collection is shown in the following tables for both CTC and Transducer decoders.
+Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
 ### Transducer Decoder
 import nemo.collections.asr as nemo_asr
 asr_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(model_name="nvidia/stt_en_fastconformer_hybrid_large_streaming_multi")
+# Optional: change the default latency. Default latency is 1040ms. Supported latencies: {0: 0ms, 1: 80ms, 16: 480ms, 33: 1040ms}.
+# Note: These are the worst latency and average latency would be half of these numbers.
+asr_model.encoder.set_default_att_context_size([70,13])
 #Optional: change the default decoder. Default decoder is Transducer (RNNT). Supported decoders: {ctc, rnnt}.
 asr_model.change_decoding_strategy(decoder_type='rnnt')