wav2vec2-large-xls-r-300m-frisian-cv-8

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the common_voice_8_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.0707
Wer: 0.0724

And on the test set:

Wer: 0.0710

Model description

This model has been developed for my Master's thesis in "Voice Technology" at Rijksuniversiteit Groningen - Campus Fryslân. It corresponds to experiment 6 where I use as training set all validated data (~ 50 hours) except the test and evaluation sets (~ 4.5 hours each). The number of training hours adds up to 41 hours of Frisian speech. This varies from experiment 2 because I fine-tune on the 300M/0.3B parameters version of XLS-R.

Intended uses & limitations

The intended use is for recognizing Frisian speech.

Limitations include no LM rescoring and using version 8.0 of Common Voice instead of 13.0.

Training and evaluation data

The evaluation split used is the one available in the Common Voice 8.0 Frisian subset. The train split corresponds to all of the validated data except for the recordings found in the evaluation and test splits.

Training procedure

The script used for training this model can be found in this GitHub repository: link.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 40
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
14.7268	0.43	400	8.7389	1.0
5.3377	0.86	800	3.7016	1.0
3.343	1.29	1200	3.0984	1.0
3.0306	1.71	1600	2.9643	1.0
2.9511	2.14	2000	2.9273	1.0
2.9078	2.57	2400	2.8202	1.0
2.4965	3.0	2800	1.3805	0.8888
1.5378	3.43	3200	0.6556	0.5720
1.119	3.86	3600	0.4260	0.4077
0.9159	4.29	4000	0.3457	0.3322
0.8037	4.72	4400	0.2765	0.2850
0.7411	5.14	4800	0.2447	0.2473
0.6767	5.57	5200	0.2176	0.2234
0.6296	6.0	5600	0.1996	0.2078
0.6165	6.43	6000	0.1891	0.1977
0.5856	6.86	6400	0.1763	0.1855
0.5674	7.29	6800	0.1708	0.1797
0.5399	7.72	7200	0.1593	0.1694
0.5195	8.15	7600	0.1551	0.1660
0.4973	8.57	8000	0.1509	0.1583
0.4907	9.0	8400	0.1480	0.1525
0.4681	9.43	8800	0.1389	0.1494
0.4513	9.86	9200	0.1368	0.1414
0.4486	10.29	9600	0.1294	0.1390
0.4381	10.72	10000	0.1262	0.1354
0.443	11.15	10400	0.1234	0.1313
0.4182	11.58	10800	0.1196	0.1294
0.4036	12.0	11200	0.1194	0.1259
0.4027	12.43	11600	0.1170	0.1226
0.4066	12.86	12000	0.1156	0.1224
0.3885	13.29	12400	0.1136	0.1174
0.3859	13.72	12800	0.1121	0.1146
0.3812	14.15	13200	0.1097	0.1141
0.3774	14.58	13600	0.1059	0.1130
0.3678	15.01	14000	0.1058	0.1096
0.3586	15.43	14400	0.1026	0.1099
0.3612	15.86	14800	0.1010	0.1076
0.3626	16.29	15200	0.0993	0.1068
0.353	16.72	15600	0.0974	0.1046
0.3564	17.15	16000	0.0986	0.1037
0.3447	17.58	16400	0.0977	0.1041
0.3454	18.01	16800	0.0945	0.1023
0.3338	18.44	17200	0.0904	0.0996
0.3359	18.86	17600	0.0950	0.1002
0.3179	19.29	18000	0.0911	0.0977
0.3202	19.72	18400	0.0906	0.0979
0.3317	20.15	18800	0.0894	0.0963
0.3187	20.58	19200	0.0878	0.0938
0.3075	21.01	19600	0.0893	0.0937
0.3032	21.44	20000	0.0872	0.0923
0.3048	21.86	20400	0.0848	0.0921
0.3045	22.29	20800	0.0860	0.0887
0.316	22.72	21200	0.0841	0.0896
0.2986	23.15	21600	0.0840	0.0876
0.294	23.58	22000	0.0824	0.0862
0.313	24.01	22400	0.0814	0.0855
0.2864	24.44	22800	0.0816	0.0861
0.2927	24.87	23200	0.0807	0.0875
0.294	25.29	23600	0.0829	0.0826
0.2834	25.72	24000	0.0794	0.0823
0.2852	26.15	24400	0.0781	0.0815
0.2823	26.58	24800	0.0781	0.0821
0.2835	27.01	25200	0.0788	0.0826
0.2763	27.44	25600	0.0789	0.0823
0.2845	27.87	26000	0.0767	0.0803
0.2777	28.3	26400	0.0775	0.0809
0.275	28.72	26800	0.0758	0.0794
0.2707	29.15	27200	0.0745	0.0790
0.2734	29.58	27600	0.0765	0.0797
0.2716	30.01	28000	0.0746	0.0780
0.2626	30.44	28400	0.0756	0.0776
0.2671	30.87	28800	0.0742	0.0763
0.2592	31.3	29200	0.0730	0.0771
0.2685	31.73	29600	0.0733	0.0760
0.2727	32.15	30000	0.0738	0.0758
0.2564	32.58	30400	0.0731	0.0763
0.2528	33.01	30800	0.0730	0.0758
0.2573	33.44	31200	0.0717	0.0746
0.2597	33.87	31600	0.0718	0.0760
0.2511	34.3	32000	0.0737	0.0750
0.2551	34.73	32400	0.0732	0.0758
0.26	35.16	32800	0.0724	0.0746
0.2563	35.58	33200	0.0717	0.0730
0.2559	36.01	33600	0.0707	0.0734
0.2499	36.44	34000	0.0721	0.0729
0.252	36.87	34400	0.0716	0.0723
0.2448	37.3	34800	0.0711	0.0725
0.248	37.73	35200	0.0710	0.0727
0.2568	38.16	35600	0.0710	0.0720
0.2471	38.59	36000	0.0707	0.0725
0.2464	39.01	36400	0.0705	0.0719
0.2477	39.44	36800	0.0706	0.0727
0.2482	39.87	37200	0.0707	0.0724

Framework versions

Transformers 4.28.1
Pytorch 2.0.0+cu117
Datasets 2.11.0
Tokenizers 0.13.3

greenw0lf
/

wav2vec2-large-xls-r-300m-frisian-cv-8