lucio
/

xls-r-uzbek-cv8

@@ -20,24 +20,24 @@ model-index:
       type: mozilla-foundation/common_voice_8_0
       args: uz
     metrics:
-       - name: Test WER (no LM)
-         type: wer
-         value: 32.88
-       - name: Test CER (no LM)
-         type: cer
-         value: 6.53
        - name: Test WER (with LM)
          type: wer
          value: 15.065
        - name: Test CER (with LM)
          type: cer
          value: 3.077
 ---
 # XLS-R-300M Uzbek CV8
 This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - UZ dataset.
-It achieves the following results on the evaluation set:
 - Loss: 0.3063
 - Wer: 0.3852
 - Cer: 0.0777
@@ -49,6 +49,8 @@ For a description of the model architecture, see [facebook/wav2vec2-xls-r-300m](
 The model vocabulary consists of the [Modern Latin alphabet for Uzbek](https://en.wikipedia.org/wiki/Uzbek_alphabet), with punctuation removed.
 Note that the characters <‘> and <’> do not count as punctuation, as <‘> modifies \<o\> and \<g\>, and <’> indicates the glottal stop or a long vowel.
 ## Intended uses & limitations
 This model is expected to be of some utility for low-fidelity use cases such as:
@@ -61,7 +63,7 @@ The model is not reliable enough to use as a substitute for live captions for ac
 The 50% of the `train` common voice official split was used as training data. The 50% of the official `dev` split was used as validation data, and the full `test` set was used for final evaluation of the model without LM, while the model with LM was evaluated only on 500 examples from the `test` set.
-The kenlm language model was compiled from the target sentences of the train + other datasets.
 ### Training hyperparameters

       type: mozilla-foundation/common_voice_8_0
       args: uz
     metrics:
        - name: Test WER (with LM)
          type: wer
          value: 15.065
        - name: Test CER (with LM)
          type: cer
          value: 3.077
+       - name: Test WER (no LM)
+         type: wer
+         value: 32.88
+       - name: Test CER (no LM)
+         type: cer
+         value: 6.53
 ---
 # XLS-R-300M Uzbek CV8
 This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - UZ dataset.
+It achieves the following results on the validation set:
 - Loss: 0.3063
 - Wer: 0.3852
 - Cer: 0.0777
 The model vocabulary consists of the [Modern Latin alphabet for Uzbek](https://en.wikipedia.org/wiki/Uzbek_alphabet), with punctuation removed.
 Note that the characters <‘> and <’> do not count as punctuation, as <‘> modifies \<o\> and \<g\>, and <’> indicates the glottal stop or a long vowel.
+The decoder uses a kenlm language model built on common_voice text.
 ## Intended uses & limitations
 This model is expected to be of some utility for low-fidelity use cases such as:
 The 50% of the `train` common voice official split was used as training data. The 50% of the official `dev` split was used as validation data, and the full `test` set was used for final evaluation of the model without LM, while the model with LM was evaluated only on 500 examples from the `test` set.
+The kenlm language model was compiled from the target sentences of the train + other dataset splits.
 ### Training hyperparameters