wav2vec2-large-xls-r-300m-cv8-nl

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice dataset. In addition a 6gram KenLM model was trained and used. The KenLM model was based on train+validation Common Voice 8 It achieves results depicted on the rigth side on the model card (testset CV8)

Model description

Dutch wav2vec2-xls-r-300m model using Common Voice 8 dataset

Intended uses & limitations

More information needed

Training and evaluation data

The model was trained on Dutch common voice 8 with 75 epochs. The train set consisted of the common voice 8 train set and evaluation set was the common voice 8 validation set. The WER reported is on the common voice 8 test set which was not part of training nor validation (eval)

Training procedure

Training hyperparameters

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.18.1
Tokenizers 0.11.0

Dataset used to train RuudVelo/wav2vec2-large-xls-r-300m-cv8-nl

Evaluation results

Test WER on Common Voice 8
self-reported

14.530
Test CER on Common Voice 8
self-reported

4.700
Test WER on Robust Speech Event - Dev Data
self-reported

33.700
Test CER on Robust Speech Event - Dev Data
self-reported

15.640
Test WER on Robust Speech Event - Test Data
self-reported

35.190

View on Papers With Code