gigant
/

romanian-wav2vec2

Automatic Speech Recognition

hf-asr-leaderboard

robust-speech-event

Inference Endpoints

Model card Files Files and versions Community

gigant commited on Feb 9, 2022

Commit

ea5cba6

•

1 Parent(s): 69de36e

Update README.md

Files changed (1) hide show

README.md +10 -5

README.md CHANGED Viewed

@@ -56,18 +56,18 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# wav2vec2-ro-300m_01
-This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) dataset, with extra training data from [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) dataset.
-It achieves the following results on the evaluation set:
 - Loss: 0.1553
 - Wer: 0.1174
 - Cer: 0.0294
 ## Model description
-More information needed
 ## Intended uses & limitations
@@ -75,7 +75,12 @@ More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# Romanian Wav2Vec2
+This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) dataset (train + validation + other splits), with extra training data from [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) dataset (train + test splits).
+Without the 5-gram Language Model optimization, it achieves the following results on the evaluation set (Common Voice 8.0, Romanian subset, test split):
 - Loss: 0.1553
 - Wer: 0.1174
 - Cer: 0.0294
 ## Model description
+The architecture is based on [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) with a speech recognition CTC head and an added 5-gram language model (using [pyctcdecode](https://github.com/kensho-technologies/pyctcdecode) and [kenlm](https://github.com/kpu/kenlm)). Those libraries are needed in order for the language model-boosted decoder to work.
 ## Intended uses & limitations
 ## Training and evaluation data
+Training data :
+- [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) : train + validation + other splits
+- [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) : train + test splits
+Evaluation data :
+- [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) : test split
 ## Training procedure