DewiBrynJones
commited on
Commit
•
f2bdd30
1
Parent(s):
a4d3644
Update README.md
Browse files
README.md
CHANGED
@@ -1,15 +1,17 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
base_model: facebook/wav2vec2-large-xlsr-53
|
4 |
-
tags:
|
5 |
-
- automatic-speech-recognition
|
6 |
-
- techiaith/commonvoice_16_1_en_cy
|
7 |
-
- generated_from_trainer
|
8 |
metrics:
|
9 |
- wer
|
10 |
model-index:
|
11 |
- name: wav2vec2-xlsr-53-ft-ccv-en-cy
|
12 |
results: []
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
---
|
14 |
|
15 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
@@ -17,22 +19,52 @@ should probably proofread and complete it, then remove this comment. -->
|
|
17 |
|
18 |
# wav2vec2-xlsr-53-ft-ccv-en-cy
|
19 |
|
20 |
-
|
21 |
-
|
22 |
-
-
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
-
##
|
26 |
|
27 |
-
More information needed
|
28 |
|
29 |
-
|
30 |
|
31 |
-
|
32 |
|
33 |
-
|
|
|
|
|
|
|
|
|
34 |
|
35 |
-
More information needed
|
36 |
|
37 |
## Training procedure
|
38 |
|
@@ -80,4 +112,4 @@ The following hyperparameters were used during training:
|
|
80 |
- Transformers 4.38.2
|
81 |
- Pytorch 2.2.1+cu121
|
82 |
- Datasets 2.18.0
|
83 |
-
- Tokenizers 0.15.2
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
base_model: facebook/wav2vec2-large-xlsr-53
|
|
|
|
|
|
|
|
|
4 |
metrics:
|
5 |
- wer
|
6 |
model-index:
|
7 |
- name: wav2vec2-xlsr-53-ft-ccv-en-cy
|
8 |
results: []
|
9 |
+
datasets:
|
10 |
+
- techiaith/commonvoice_16_1_en_cy
|
11 |
+
language:
|
12 |
+
- cy
|
13 |
+
- en
|
14 |
+
pipeline_tag: automatic-speech-recognition
|
15 |
---
|
16 |
|
17 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
|
|
19 |
|
20 |
# wav2vec2-xlsr-53-ft-ccv-en-cy
|
21 |
|
22 |
+
A speech recognition acoustic model for Welsh and English, fine-tuned from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using English/Welsh balanced data derived from version 11 of their respective Common Voice datasets (https://commonvoice.mozilla.org/cy/datasets). Custom bilingual Common Voice train/dev and test splits were built using the scripts at https://github.com/techiaith/docker-commonvoice-custom-splits-builder#introduction
|
23 |
+
|
24 |
+
Source code and scripts for training wav2vec2-xlsr-ft-en-cy can be found at [https://github.com/techiaith/docker-wav2vec2-cy](https://github.com/techiaith/docker-wav2vec2-cy/blob/main/train/fine-tune/python/run_en_cy.sh).
|
25 |
+
|
26 |
+
|
27 |
+
## Usage
|
28 |
+
|
29 |
+
The wav2vec2-xlsr-53-ft-ccv-en-cy model can be used directly as follows:
|
30 |
+
|
31 |
+
```python
|
32 |
+
import torch
|
33 |
+
import torchaudio
|
34 |
+
import librosa
|
35 |
+
|
36 |
+
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
37 |
+
|
38 |
+
processor = Wav2Vec2Processor.from_pretrained("techiaith/wav2vec2-xlsr-53-ft-ccv-en-cy")
|
39 |
+
model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-53-ft-ccv-en-cy")
|
40 |
+
|
41 |
+
audio, rate = librosa.load(audio_file, sr=16000)
|
42 |
+
|
43 |
+
inputs = processor(audio, sampling_rate=16_000, return_tensors="pt", padding=True)
|
44 |
+
|
45 |
+
with torch.no_grad():
|
46 |
+
tlogits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
|
47 |
+
|
48 |
+
# greedy decoding
|
49 |
+
predicted_ids = torch.argmax(logits, dim=-1)
|
50 |
+
|
51 |
+
print("Prediction:", processor.batch_decode(predicted_ids))
|
52 |
+
|
53 |
+
```
|
54 |
|
55 |
+
## Evaluation
|
56 |
|
|
|
57 |
|
58 |
+
According to a balanced English+Welsh test set derived from Common Voice version 16.1, the WER of techiaith/wav2vec2-xlsr-53-ft-ccv-en-cy is **23.79%**
|
59 |
|
60 |
+
However, when evaluated with language specific test sets, the model exhibits a bias to perform better with Welsh.
|
61 |
|
62 |
+
| Common Voice Test Set Language | WER | CER |
|
63 |
+
| -------- | --- | --- |
|
64 |
+
| EN+CY | 23.79| 9.68 |
|
65 |
+
| EN | 34.47 | 14.83 |
|
66 |
+
| CY | 12.34 | 3.55 |
|
67 |
|
|
|
68 |
|
69 |
## Training procedure
|
70 |
|
|
|
112 |
- Transformers 4.38.2
|
113 |
- Pytorch 2.2.1+cu121
|
114 |
- Datasets 2.18.0
|
115 |
+
- Tokenizers 0.15.2
|