metadata

license: gpl-3.0
language:
  - be
tags:
  - audio
  - speech
  - automatic-speech-recognition
datasets:
  - mozilla-foundation/common_voice_8_0
metrics:
  - wer
model-index:
  - name: wav2vec2
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 8
          type: mozilla-foundation/common_voice_8_0
          args: be
        metrics:
          - name: Dev WER
            type: wer
            value: 17.61
          - name: Test WER
            type: wer
            value: 18.7
          - name: Dev WER (with LM)
            type: wer
            value: 11.5
          - name: Test WER (with LM)
            type: wer
            value: 12.4

Automatic Speech Recognition for Belarusian language

Fine-tuned version of facebook/wav2vec2-base on mozilla-foundation/common_voice_8_0 be dataset.

Train, Dev, Test splits were used as they are present in the dataset. No additional data was used from Validated split, only 1 voicing of each sentence was used - the way the data was split by CommonVoice CorporaCreator. To build a better model one can use additional voicings from Validated split for sentences already present in Train, Dev, Test splits, i.e. enlarge mentioned splits.

Language model was built using KenLM. 5-gram Language model was built on sentences from Train + (Other - Dev - Test) splits of mozilla-foundation/common_voice_8_0 be dataset.

Source code is available here.

Run model in a browser

This page contains interactive demo widget that lets you test this model right in a browser.

However, this widget uses Acoustic model only without Language model that significantly improves overall performance.

You can play with full pipeline of Acoustic model + Language model on the following spaces page (also works from browser).