Update README.md
Browse files
README.md
CHANGED
@@ -23,12 +23,12 @@ model-index:
|
|
23 |
metrics:
|
24 |
- name: Test WER
|
25 |
type: wer
|
26 |
-
value:
|
27 |
---
|
28 |
|
29 |
# Wav2Vec2-Large-XLSR-53-lg
|
30 |
|
31 |
-
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Luganda using the [Common Voice](https://huggingface.co/datasets/common_voice) dataset, using train, validation and other (
|
32 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
33 |
|
34 |
## Usage
|
@@ -126,10 +126,11 @@ result = test_dataset.map(evaluate, batched=True, batch_size=8)
|
|
126 |
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["norm_text"])))
|
127 |
```
|
128 |
|
129 |
-
**Test Result**:
|
130 |
|
131 |
## Training
|
132 |
|
133 |
-
The Common Voice `train`, `validation` and `other` datasets were used for training, augmented to twice the original size with added noise and manipulated pitch, phase and intensity.
|
|
|
134 |
|
135 |
-
The script used for training
|
|
|
23 |
metrics:
|
24 |
- name: Test WER
|
25 |
type: wer
|
26 |
+
value: 29.52
|
27 |
---
|
28 |
|
29 |
# Wav2Vec2-Large-XLSR-53-lg
|
30 |
|
31 |
+
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Luganda using the [Common Voice](https://huggingface.co/datasets/common_voice) dataset, using train, validation and other (excluding voices that are in the test set), and taking the test data for validation as well as test.
|
32 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
33 |
|
34 |
## Usage
|
|
|
126 |
print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["norm_text"])))
|
127 |
```
|
128 |
|
129 |
+
**Test Result**: 29.52 %
|
130 |
|
131 |
## Training
|
132 |
|
133 |
+
The Common Voice `train`, `validation` and `other` datasets were used for training, excluding voices that are in both the `other` and `test` datasets. The data was augmented to twice the original size with added noise and manipulated pitch, phase and intensity.
|
134 |
+
Training proceeded for 60 epochs, on 1 V100 GPU provided by OVHcloud. The `test` data was used for validation.
|
135 |
|
136 |
+
The [script used for training](https://github.com/serapio/transformers/blob/feature/xlsr-finetune/examples/research_projects/wav2vec2/run_common_voice.py) is adapted from the [example script provided in the transformers repo](https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/run_common_voice.py).
|