techiaith
/

wav2vec2-xlsr-53-ft-cy-en-withlm

@@ -1,15 +1,17 @@
 ---
 license: apache-2.0
 base_model: facebook/wav2vec2-large-xlsr-53
-tags:
-- automatic-speech-recognition
-- techiaith/commonvoice_16_1_en_cy
-- generated_from_trainer
 metrics:
 - wer
 model-index:
 - name: wav2vec2-xlsr-53-ft-ccv-en-cy
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -17,22 +19,52 @@ should probably proofread and complete it, then remove this comment. -->
 # wav2vec2-xlsr-53-ft-ccv-en-cy
-This model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on the TECHIAITH/COMMONVOICE_16_1_EN_CY - DEFAULT dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.2754
-- Wer: 0.2115
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
@@ -80,4 +112,4 @@ The following hyperparameters were used during training:
 - Transformers 4.38.2
 - Pytorch 2.2.1+cu121
 - Datasets 2.18.0
-- Tokenizers 0.15.2

 ---
 license: apache-2.0
 base_model: facebook/wav2vec2-large-xlsr-53
 metrics:
 - wer
 model-index:
 - name: wav2vec2-xlsr-53-ft-ccv-en-cy
   results: []
+datasets:
+- techiaith/commonvoice_16_1_en_cy
+language:
+- cy
+- en
+pipeline_tag: automatic-speech-recognition
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # wav2vec2-xlsr-53-ft-ccv-en-cy
+A speech recognition acoustic model for Welsh and English, fine-tuned from [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) using English/Welsh balanced data derived from version 11 of their respective Common Voice datasets (https://commonvoice.mozilla.org/cy/datasets). Custom bilingual Common Voice train/dev and test splits were built using the scripts at https://github.com/techiaith/docker-commonvoice-custom-splits-builder#introduction
+Source code and scripts for training wav2vec2-xlsr-ft-en-cy can be found at [https://github.com/techiaith/docker-wav2vec2-cy](https://github.com/techiaith/docker-wav2vec2-cy/blob/main/train/fine-tune/python/run_en_cy.sh).
+## Usage
+The wav2vec2-xlsr-53-ft-ccv-en-cy model can be used directly as follows:
+```python
+import torch
+import torchaudio
+import librosa
+from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+processor = Wav2Vec2Processor.from_pretrained("techiaith/wav2vec2-xlsr-53-ft-ccv-en-cy")
+model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-53-ft-ccv-en-cy")
+audio, rate = librosa.load(audio_file, sr=16000)
+inputs = processor(audio, sampling_rate=16_000, return_tensors="pt", padding=True)
+with torch.no_grad():
+  tlogits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
+# greedy decoding
+predicted_ids = torch.argmax(logits, dim=-1)
+print("Prediction:", processor.batch_decode(predicted_ids))
+```
+## Evaluation
+According to a balanced English+Welsh test set derived from Common Voice version 16.1, the WER of techiaith/wav2vec2-xlsr-53-ft-ccv-en-cy is **23.79%**
+However, when evaluated with language specific test sets, the model exhibits a bias to perform better with Welsh.
+| Common Voice Test Set Language | WER | CER |
+| -------- | --- | --- |
+| EN+CY | 23.79| 9.68  |
+| EN | 34.47  | 14.83  |
+| CY | 12.34  | 3.55  |
 ## Training procedure
 - Transformers 4.38.2
 - Pytorch 2.2.1+cu121
 - Datasets 2.18.0
+- Tokenizers 0.15.2