nvidia
/

stt_uk_citrinet_1024_gamma_0_25

Automatic Speech Recognition

hf-asr-leaderboard

Model card Files Files and versions Community

dpykhtar commited on Jul 29, 2022

Commit

c8a3553

•

1 Parent(s): 9bc9708

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ img {
 | [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
 This model transcribes speech in lowercase Ukrainian alphabet including spaces and apostrophes, and is trained on 69 hours of Ukrainian speech data.
-It is a non-autoregressive "large" variant of Streaming Citrinet, with around 141 million parameters.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
 It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
@@ -88,7 +88,7 @@ The tokenizer for this models was built using the text transcripts of the train
 ### Datasets
-Model is trained on Mozilla Common Voice Corpus 10.0 dataset comprising of 69 hours of Ukrainian speech.
 ## Limitations
@@ -107,4 +107,5 @@ Check out [Riva live demo](https://developer.nvidia.com/riva#demos).
 [1] [Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition](https://arxiv.org/abs/2104.01721) <br />
 [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece) <br />
-[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)

 | [![Riva Compatible](https://img.shields.io/badge/NVIDIA%20Riva-compatible-brightgreen#model-badge)](#deployment-with-nvidia-riva) |
 This model transcribes speech in lowercase Ukrainian alphabet including spaces and apostrophes, and is trained on 69 hours of Ukrainian speech data.
+It is a non-autoregressive "large" variant of Streaming Citrinet, with around 141 million parameters. Model is fine-tuned with pre-trained Russian Citrinet-1024 model on Ukrainian speech data using Cross-Language Transfer Learning [4] approach.
 See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc) for complete architecture details.
 It is also compatible with NVIDIA Riva for [production-grade server deployments](#deployment-with-nvidia-riva).
 ### Datasets
+Model is trained on validated Mozilla Common Voice Corpus 10.0 dataset(excluding dev and test data) comprising of 69 hours of Ukrainian speech.
 ## Limitations
 [1] [Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition](https://arxiv.org/abs/2104.01721) <br />
 [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece) <br />
+[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) <br />
+[4] [Cross-Language Transfer Learning](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=qmmIGnwAAAAJ&sortby=pubdate&citation_for_view=qmmIGnwAAAAJ:PVjk1bu6vJQC)