Update README.md
Browse files
README.md
CHANGED
@@ -65,7 +65,8 @@ img {
|
|
65 |
| [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
|
66 |
|
67 |
This collection contains large-size versions of cache-aware FastConformer-Hybrid (around 114M parameters) with multiple look-ahead support, trained on a large scale english speech.
|
68 |
-
These models are trained for streaming ASR, which be used for streaming applications with a variety of latencies (0ms, 80ms, 480s, 1040ms).
|
|
|
69 |
|
70 |
|
71 |
## Model Architecture
|
@@ -108,7 +109,8 @@ Note: older versions of the model may have trained on smaller set of datasets.
|
|
108 |
|
109 |
## Performance
|
110 |
|
111 |
-
The list of the available models in this collection is shown in the following tables for both CTC and Transducer decoders.
|
|
|
112 |
|
113 |
### Transducer Decoder
|
114 |
|
@@ -152,8 +154,9 @@ Then simply do:
|
|
152 |
import nemo.collections.asr as nemo_asr
|
153 |
asr_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(model_name="nvidia/stt_en_fastconformer_hybrid_large_streaming_multi")
|
154 |
|
155 |
-
#Optional: change the default latency. Default latency is 1040ms. Supported latencies: {0: 0ms, 1: 80ms, 16: 480ms, 33: 1040ms}.
|
156 |
-
|
|
|
157 |
|
158 |
#Optional: change the default decoder. Default decoder is Transducer (RNNT). Supported decoders: {ctc, rnnt}.
|
159 |
asr_model.change_decoding_strategy(decoder_type='rnnt')
|
|
|
65 |
| [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
|
66 |
|
67 |
This collection contains large-size versions of cache-aware FastConformer-Hybrid (around 114M parameters) with multiple look-ahead support, trained on a large scale english speech.
|
68 |
+
These models are trained for streaming ASR, which be used for streaming applications with a variety of latencies (0ms, 80ms, 480s, 1040ms).
|
69 |
+
These are the worst latency and average latency of the model for each case would be half of these numbers.
|
70 |
|
71 |
|
72 |
## Model Architecture
|
|
|
109 |
|
110 |
## Performance
|
111 |
|
112 |
+
The list of the available models in this collection is shown in the following tables for both CTC and Transducer decoders.
|
113 |
+
Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
|
114 |
|
115 |
### Transducer Decoder
|
116 |
|
|
|
154 |
import nemo.collections.asr as nemo_asr
|
155 |
asr_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(model_name="nvidia/stt_en_fastconformer_hybrid_large_streaming_multi")
|
156 |
|
157 |
+
# Optional: change the default latency. Default latency is 1040ms. Supported latencies: {0: 0ms, 1: 80ms, 16: 480ms, 33: 1040ms}.
|
158 |
+
# Note: These are the worst latency and average latency would be half of these numbers.
|
159 |
+
asr_model.encoder.set_default_att_context_size([70,13])
|
160 |
|
161 |
#Optional: change the default decoder. Default decoder is Transducer (RNNT). Supported decoders: {ctc, rnnt}.
|
162 |
asr_model.change_decoding_strategy(decoder_type='rnnt')
|