license: gpl-3.0
language:
- be
tags:
- audio
- speech
- automatic-speech-recognition
datasets:
- mozilla-foundation/common_voice_8_0
metrics:
- wer
model-index:
- name: wav2vec2
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 8
type: mozilla-foundation/common_voice_8_0
args: be
metrics:
- name: Dev WER
type: wer
value: 17.61
- name: Test WER
type: wer
value: 18.7
- name: Dev WER (with LM)
type: wer
value: 11.5
- name: Test WER (with LM)
type: wer
value: 12.4
Automatic Speech Recognition for Belarusian language
Fine-tuned version of facebook/wav2vec2-base on mozilla-foundation/common_voice_8_0 be
dataset.
Train
, Dev
, Test
splits were used as they are present in the dataset. No additional data was used from Validated
split,
only 1 voicing of each sentence was used - the way the data was split by CommonVoice CorporaCreator.
To build a better model one can use additional voicings from Validated
split for sentences already present in Train
, Dev
, Test
splits,
i.e. enlarge mentioned splits.
Language model was built using KenLM.
5-gram Language model was built on sentences from Train + (Other - Dev - Test)
splits of mozilla-foundation/common_voice_8_0 be
dataset.
Source code is available here.
Run model in a browser
This page contains interactive demo widget that lets you test this model right in a browser.
However, this widget uses Acoustic model only without Language model that significantly improves overall performance.
You can play with full pipeline of Acoustic model + Language model on the following spaces page (also works from browser).