Automatic Speech Recognition
Transformers
Safetensors
Welsh
English
wav2vec2
Inference Endpoints
DewiBrynJones commited on
Commit
f2bdd30
1 Parent(s): a4d3644

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -15
README.md CHANGED
@@ -1,15 +1,17 @@
1
  ---
2
  license: apache-2.0
3
  base_model: facebook/wav2vec2-large-xlsr-53
4
- tags:
5
- - automatic-speech-recognition
6
- - techiaith/commonvoice_16_1_en_cy
7
- - generated_from_trainer
8
  metrics:
9
  - wer
10
  model-index:
11
  - name: wav2vec2-xlsr-53-ft-ccv-en-cy
12
  results: []
 
 
 
 
 
 
13
  ---
14
 
15
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -17,22 +19,52 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  # wav2vec2-xlsr-53-ft-ccv-en-cy
19
 
20
- This model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on the TECHIAITH/COMMONVOICE_16_1_EN_CY - DEFAULT dataset.
21
- It achieves the following results on the evaluation set:
22
- - Loss: 0.2754
23
- - Wer: 0.2115
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
- ## Model description
26
 
27
- More information needed
28
 
29
- ## Intended uses & limitations
30
 
31
- More information needed
32
 
33
- ## Training and evaluation data
 
 
 
 
34
 
35
- More information needed
36
 
37
  ## Training procedure
38
 
@@ -80,4 +112,4 @@ The following hyperparameters were used during training:
80
  - Transformers 4.38.2
81
  - Pytorch 2.2.1+cu121
82
  - Datasets 2.18.0
83
- - Tokenizers 0.15.2
 
1
  ---
2
  license: apache-2.0
3
  base_model: facebook/wav2vec2-large-xlsr-53
 
 
 
 
4
  metrics:
5
  - wer
6
  model-index:
7
  - name: wav2vec2-xlsr-53-ft-ccv-en-cy
8
  results: []
9
+ datasets:
10
+ - techiaith/commonvoice_16_1_en_cy
11
+ language:
12
+ - cy
13
+ - en
14
+ pipeline_tag: automatic-speech-recognition
15
  ---
16
 
17
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
19
 
20
  # wav2vec2-xlsr-53-ft-ccv-en-cy
21
 
22
+ A speech recognition acoustic model for Welsh and English, fine-tuned from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using English/Welsh balanced data derived from version 11 of their respective Common Voice datasets (https://commonvoice.mozilla.org/cy/datasets). Custom bilingual Common Voice train/dev and test splits were built using the scripts at https://github.com/techiaith/docker-commonvoice-custom-splits-builder#introduction
23
+
24
+ Source code and scripts for training wav2vec2-xlsr-ft-en-cy can be found at [https://github.com/techiaith/docker-wav2vec2-cy](https://github.com/techiaith/docker-wav2vec2-cy/blob/main/train/fine-tune/python/run_en_cy.sh).
25
+
26
+
27
+ ## Usage
28
+
29
+ The wav2vec2-xlsr-53-ft-ccv-en-cy model can be used directly as follows:
30
+
31
+ ```python
32
+ import torch
33
+ import torchaudio
34
+ import librosa
35
+
36
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
37
+
38
+ processor = Wav2Vec2Processor.from_pretrained("techiaith/wav2vec2-xlsr-53-ft-ccv-en-cy")
39
+ model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-53-ft-ccv-en-cy")
40
+
41
+ audio, rate = librosa.load(audio_file, sr=16000)
42
+
43
+ inputs = processor(audio, sampling_rate=16_000, return_tensors="pt", padding=True)
44
+
45
+ with torch.no_grad():
46
+ tlogits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
47
+
48
+ # greedy decoding
49
+ predicted_ids = torch.argmax(logits, dim=-1)
50
+
51
+ print("Prediction:", processor.batch_decode(predicted_ids))
52
+
53
+ ```
54
 
55
+ ## Evaluation
56
 
 
57
 
58
+ According to a balanced English+Welsh test set derived from Common Voice version 16.1, the WER of techiaith/wav2vec2-xlsr-53-ft-ccv-en-cy is **23.79%**
59
 
60
+ However, when evaluated with language specific test sets, the model exhibits a bias to perform better with Welsh.
61
 
62
+ | Common Voice Test Set Language | WER | CER |
63
+ | -------- | --- | --- |
64
+ | EN+CY | 23.79| 9.68 |
65
+ | EN | 34.47 | 14.83 |
66
+ | CY | 12.34 | 3.55 |
67
 
 
68
 
69
  ## Training procedure
70
 
 
112
  - Transformers 4.38.2
113
  - Pytorch 2.2.1+cu121
114
  - Datasets 2.18.0
115
+ - Tokenizers 0.15.2