|
--- |
|
language: |
|
- eo |
|
license: apache-2.0 |
|
tags: |
|
- automatic-speech-recognition |
|
- mozilla-foundation/common_voice_13_0 |
|
- generated_from_trainer |
|
metrics: |
|
- wer |
|
model-index: |
|
- name: wav2vec2-common_voice_13_0-eo-3 |
|
results: [] |
|
--- |
|
|
|
# wav2vec2-common_voice_13_0-eo-3, an Esperanto speech recognizer |
|
|
|
This model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on the [mozilla-foundation/common_voice_13_0](https://huggingface.co/datasets/mozilla-foundation/common_voice_13_0) Esperanto dataset. |
|
It achieves the following results on the evaluation set: |
|
|
|
- Loss: 0.2191 |
|
- Cer: 0.0208 |
|
- Wer: 0.0687 |
|
|
|
The first 10 samples in the test set: |
|
|
|
| Actual<br>Predicted | CER | |
|
|:--------------------|:----| |
|
| `la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo`<br>`la orienta parto apud benino kaj niĝerio estis nomita sklavmarbordo` | 0.0 | |
|
| `en la sekva jaro li ricevis premion`<br>`en la sekva jaro li ricevis prenion` | 0.02857142857142857 | |
|
| `ŝi studis historion ĉe la universitato de brita kolumbio`<br>`ŝi studis historion ĉe la universitato de brita kolumbio` | 0.0 | |
|
| `larĝaj ŝtupoj kuras al la fasado`<br>`larĝaj ŝtupoj kuras al la fasado` | 0.0 | |
|
| `la municipo ĝuas duan epokon de etendo kaj disvolviĝo`<br>`la municipo ĝuas duonepokon de tendo kaj disvolviĝo` | 0.05660377358490566 | |
|
| `li estis ankaŭ katedrestro kaj dekano`<br>`li estis ankaŭ katedresto kaj dekano` | 0.02702702702702703 | |
|
| `librovendejo apartenas al la muzeo`<br>`librovendejo apartenas al la muzeo` | 0.0 | |
|
| `ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵaro de arbaroj`<br>`ĝi estas kutime malfacile videbla kaj troviĝas en subkreskaĵo de arbaroj` | 0.02702702702702703 | |
|
| `unue ili estas ruĝaj poste brunaj`<br>`unue ili estas ruĝaj poste brunaj` | 0.0 | |
|
| `la loĝantaro laboras en la proksima ĉefurbo`<br>`la loĝantaro laboras en la proksima ĉefurbo` | 0.0 | |
|
|
|
|
|
## Model description |
|
|
|
See [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53). |
|
|
|
## Intended uses & limitations |
|
|
|
Speech recognition for Esperanto. The base model was pretrained and finetuned on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16KHz. |
|
|
|
## Training and evaluation data |
|
|
|
The training split was set to `train[:15000]` while the eval split was set to `validation[:1500]`. |
|
|
|
## Training procedure |
|
|
|
I used [`run_speech_recognition_ctc.py`](https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition) with the following `train.json` file passed to it: |
|
|
|
```json |
|
{ |
|
"dataset_name": "mozilla-foundation/common_voice_13_0", |
|
"model_name_or_path": "facebook/wav2vec2-large-xlsr-53", |
|
"dataset_config_name": "eo", |
|
"output_dir": "./wav2vec2-common_voice_13_0-eo-3", |
|
"train_split_name": "train[:15000]", |
|
"eval_split_name": "validation[:1500]", |
|
"eval_metrics": ["cer", "wer"], |
|
"overwrite_output_dir": true, |
|
"preprocessing_num_workers": 8, |
|
"num_train_epochs": 100, |
|
"per_device_train_batch_size": 8, |
|
"gradient_accumulation_steps": 4, |
|
"gradient_checkpointing": true, |
|
"learning_rate": 3e-5, |
|
"warmup_steps": 500, |
|
"evaluation_strategy": "steps", |
|
"text_column_name": "sentence", |
|
"length_column_name": "input_length", |
|
"save_steps": 1000, |
|
"eval_steps": 1000, |
|
"layerdrop": 0.1, |
|
"save_total_limit": 3, |
|
"freeze_feature_encoder": true, |
|
"chars_to_ignore": "-!\"'(),.:;=?_`¨«¸»ʼ‑–—‘’“”„…‹›♫?", |
|
"chars_to_substitute": { |
|
"przy": "pŝe", |
|
"byn": "bin", |
|
"cx": "ĉ", |
|
"sx": "ŝ", |
|
"fi": "fi", |
|
"fl": "fl", |
|
"ǔ": "ŭ", |
|
"ñ": "nj", |
|
"á": "a", |
|
"é": "e", |
|
"ü": "ŭ", |
|
"y": "j", |
|
"qu": "ku" |
|
}, |
|
"fp16": true, |
|
"group_by_length": true, |
|
"push_to_hub": true, |
|
"do_train": true, |
|
"do_eval": true |
|
} |
|
``` |
|
|
|
I went through the dataset to find non-speech characters, and these were placed in `chars_to_ignore`. In addition, there were character sequences that could be transcribed to Esperanto phonemes, and these were placed as a dictionary in `chars_to_substitute`. This required adding such an argument to the program: |
|
|
|
```py |
|
def dict_field(default=None, metadata=None): |
|
return field(default_factory=lambda: default, metadata=metadata) |
|
|
|
@dataclass |
|
class DataTrainingArguments: |
|
... |
|
chars_to_substitute: Optional[Dict[str, str]] = dict_field( |
|
default=None, |
|
metadata={"help": "A dict of characters to replace."}, |
|
) |
|
|
|
``` |
|
|
|
Then I copied `remove_special_characters` to do the actual substitution: |
|
|
|
```py |
|
def remove_special_characters(batch): |
|
text = batch[text_column_name] |
|
if chars_to_ignore_regex is not None: |
|
text = re.sub(chars_to_ignore_regex, "", batch[text_column_name]) |
|
batch["target_text"] = text.lower() + " " |
|
return batch |
|
|
|
def substitute_characters(batch): |
|
text: str = batch["target_text"] |
|
if data_args.chars_to_substitute is not None: |
|
for k, v in data_args.chars_to_substitute.items(): |
|
text.replace(k, v) |
|
batch["target_text"] = text.lower() |
|
return batch |
|
|
|
with training_args.main_process_first(desc="dataset map special characters removal"): |
|
raw_datasets = raw_datasets.map( |
|
remove_special_characters, |
|
remove_columns=[text_column_name], |
|
desc="remove special characters from datasets", |
|
) |
|
|
|
with training_args.main_process_first(desc="dataset map special characters substitute"): |
|
raw_datasets = raw_datasets.map( |
|
substitute_characters, |
|
desc="substitute special characters in datasets", |
|
) |
|
``` |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 3e-05 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 4 |
|
- total_train_batch_size: 32 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- layerdrop: 0.1 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 500 |
|
- num_epochs: 100 |
|
- mixed_precision_training: Native AMP |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Cer | Validation Loss | Wer | |
|
|:-------------:|:-----:|:-----:|:------:|:---------------:|:------:| |
|
| 2.6416 | 2.13 | 1000 | 0.1541 | 0.8599 | 0.6449 | |
|
| 0.2633 | 4.27 | 2000 | 0.0335 | 0.1897 | 0.1431 | |
|
| 0.1739 | 6.4 | 3000 | 0.0289 | 0.1732 | 0.1145 | |
|
| 0.1378 | 8.53 | 4000 | 0.0276 | 0.1729 | 0.1066 | |
|
| 0.1172 | 10.67 | 5000 | 0.0268 | 0.1773 | 0.1019 | |
|
| 0.1049 | 12.8 | 6000 | 0.0255 | 0.1701 | 0.0937 | |
|
| 0.0951 | 14.93 | 7000 | 0.0253 | 0.1718 | 0.0933 | |
|
| 0.0851 | 17.07 | 8000 | 0.0239 | 0.1787 | 0.0834 | |
|
| 0.0809 | 19.2 | 9000 | 0.0235 | 0.1802 | 0.0835 | |
|
| 0.0756 | 21.33 | 10000 | 0.0239 | 0.1784 | 0.0855 | |
|
| 0.0708 | 23.47 | 11000 | 0.0235 | 0.1748 | 0.0824 | |
|
| 0.0657 | 25.6 | 12000 | 0.0228 | 0.1830 | 0.0796 | |
|
| 0.0605 | 27.73 | 13000 | 0.0230 | 0.1896 | 0.0798 | |
|
| 0.0583 | 29.87 | 14000 | 0.0224 | 0.1889 | 0.0778 | |
|
| 0.0608 | 32.0 | 15000 | 0.0223 | 0.1849 | 0.0757 | |
|
| 0.0556 | 34.13 | 16000 | 0.0223 | 0.1872 | 0.0767 | |
|
| 0.0534 | 36.27 | 17000 | 0.0221 | 0.1893 | 0.0751 | |
|
| 0.0523 | 38.4 | 18000 | 0.0218 | 0.1925 | 0.0729 | |
|
| 0.0494 | 40.53 | 19000 | 0.0221 | 0.1957 | 0.0745 | |
|
| 0.0475 | 42.67 | 20000 | 0.0217 | 0.1961 | 0.0740 | |
|
| 0.048 | 44.8 | 21000 | 0.0214 | 0.1957 | 0.0714 | |
|
| 0.0459 | 46.93 | 22000 | 0.0215 | 0.1968 | 0.0717 | |
|
| 0.0435 | 49.07 | 23000 | 0.0217 | 0.2008 | 0.0717 | |
|
| 0.0428 | 51.2 | 24000 | 0.0212 | 0.1991 | 0.0696 | |
|
| 0.0418 | 53.33 | 25000 | 0.0215 | 0.2034 | 0.0714 | |
|
| 0.0404 | 55.47 | 26000 | 0.0210 | 0.2014 | 0.0684 | |
|
| 0.0394 | 57.6 | 27000 | 0.0210 | 0.2050 | 0.0681 | |
|
| 0.0399 | 59.73 | 28000 | 0.0211 | 0.2039 | 0.0700 | |
|
| 0.0389 | 61.87 | 29000 | 0.0214 | 0.2091 | 0.0694 | |
|
| 0.038 | 64.0 | 30000 | 0.0210 | 0.2100 | 0.0702 | |
|
| 0.0361 | 66.13 | 31000 | 0.0215 | 0.2119 | 0.0703 | |
|
| 0.0359 | 68.27 | 32000 | 0.0213 | 0.2108 | 0.0714 | |
|
| 0.0354 | 70.4 | 33000 | 0.0211 | 0.2120 | 0.0699 | |
|
| 0.0364 | 72.53 | 34000 | 0.0211 | 0.2128 | 0.0688 | |
|
| 0.0361 | 74.67 | 35000 | 0.0212 | 0.2134 | 0.0694 | |
|
| 0.0332 | 76.8 | 36000 | 0.0210 | 0.2176 | 0.0698 | |
|
| 0.0341 | 78.93 | 37000 | 0.0208 | 0.2170 | 0.0688 | |
|
| 0.032 | 81.07 | 38000 | 0.0209 | 0.2157 | 0.0686 | |
|
| 0.0318 | 83.33 | 39000 | 0.0209 | 0.2166 | 0.0685 | |
|
| 0.0325 | 85.47 | 40000 | 0.0209 | 0.2172 | 0.0687 | |
|
| 0.0316 | 87.6 | 41000 | 0.0208 | 0.2181 | 0.0678 | |
|
| 0.0302 | 89.73 | 42000 | 0.0208 | 0.2171 | 0.0679 | |
|
| 0.0318 | 91.87 | 43000 | 0.0211 | 0.2179 | 0.0702 | |
|
| 0.0314 | 94.0 | 44000 | 0.0208 | 0.2186 | 0.0690 | |
|
| 0.0309 | 96.13 | 45000 | 0.0210 | 0.2193 | 0.0696 | |
|
| 0.031 | 98.27 | 46000 | 0.0208 | 0.2191 | 0.0686 | |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.29.1 |
|
- Pytorch 2.0.1+cu118 |
|
- Datasets 2.12.0 |
|
- Tokenizers 0.13.3 |
|
|
|
## Discussion |
|
|
|
### Nans and Infs |
|
|
|
While debugging other training sessions where more data from the Esperanto Common Voice dataset was used -- some loss calculations were returning either `inf` or `nan` -- I found that some of the training set trained with this model had surprisingly high CER. Some examples: |
|
|
|
| file | Actual<br>---<br>Predicted | CER | Comment | |
|
|:-----|:--------------------|:----|:--------| |
|
|common_voice_eo_25365027.mp3 | en la hansaj agentejoj komercistoj el la regiono renkontis kolegojn el aliaj regionoj<br>---<br>a taaj keo eoj eejn kigos eegoj eioeegiooj| 0.61 | No audio | |
|
|common_voice_eo_25365472.mp3 | ili vendas armilojn kaj teknologiojn al la fanatikuloj por gajni monon monon monon<br>---<br>ila mamato aiil ajn kno ion a a aotigojn pu aiooo aj knon | 0.55 | Barely any audio, distorted | |
|
|common_voice_eo_25365836.mp3 | industria apliko estas la kreado de modifitaj bakterioj kiuj produktas deziratan kemian substancon<br>---<br>iiti sieetas la eeadooddddooiooaotooeioj aiicenon | 0.67 | Barely any audio, distorted | |
|
|2600 | ili akiras plenkreskan plumaron nur en la kvina jaro<br>---<br>ili aaros peetaj patato a a sia ro | 0.52 | It's literally someone saying 'injabum'. Thanks, troll. | |
|
|7333 | poste sekvas difinoj de la termino<br>---<br>po | 0.94 | No audio | |
|
|7334 | li gvidis multajn kursojn laŭ la csehmetodo<br>---<br>po | 0.98 | No audio | |
|
|7429 | tamen pro la rekonstruo de kluzoj ne eblas trapasi komplete<br>---<br>po | 0.97 | No audio | |
|
|11662 | lingvotesto estas postulata ekzemple por akceptiĝo en anglalingvaj altlernejoj<br>---<br>linkonteto estastitot etateerteito en pootaeaje lgijoj | 0.58 | No audio | |
|
|
|
Some examples have no audio. All of these files in the dataset are completely useless, and should be removed from the training set. |
|
|
|
You can see that the model is trying to hallucinate the target when there's little or no audio. This is terrible for realistically reporting what was said. I'd also hope that there is some measure of certainty, and maybe only go with transcriptions that have relatively high certainty. However, I can't find how to get at a certainty value. |
|
|
|
The Common Voice dataset also contains upvotes and downvotes. Of the high CER sentences above, all had 2 upvotes, with some having 0 downvotes, and some having 1. So we cannot rely on upvotes or downvotes to detect quality. |
|
|
|
So what to do? |
|
|
|
### Alternative 1 |
|
|
|
Despite these zero- and low-quality files, training seems to work OK. However, we still need to address when loss becomes `nan` or `inf` because that ruins the calculation. |
|
|
|
By running `run_speech_recognition_ctc` with `do_train=false`, setting `model_name_or_path="xekri/wav2vec2-common_voice_13_0-eo-3"`, setting `eval_split_name` to either `test`, `validation`, or `train`, and also modifying `trainer.py` as follows, I can check if any losses are nan or inf: |
|
|
|
```py |
|
# To be JSON-serializable, we need to remove numpy types or zero-d tensors |
|
metrics = denumpify_detensorize(metrics) |
|
|
|
if all_losses is not None: |
|
loss_nan = np.where(np.isnan(all_losses)) |
|
if len(loss_nan) != 0: |
|
print(f'LOSSES ARE NAN: {loss_nan}') |
|
loss_inf = np.where(np.isinf(all_losses)) |
|
if len(loss_inf) != 0: |
|
print(f'LOSSES ARE INF: {loss_inf}') |
|
metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item() |
|
``` |
|
|
|
Doing this shows that of the 14913 examples in `test`, the following example results in `inf` loss: |
|
|
|
`common_voice_eo_25167318.mp3` |
|
|
|
The audio on this is severly garbled. This should absolutely be filtered out of the test set. |
|
|
|
No `validation` samples result in `inf` or `nan`. |
|
|
|
The following 18 out of 143984 examples in `train` result in `inf` loss: |
|
|
|
```txt |
|
common_voice_eo_25467641.mp3 |
|
common_voice_eo_25467723.mp3 |
|
common_voice_eo_25467791.mp3 |
|
common_voice_eo_25467820.mp3 |
|
common_voice_eo_25467943.mp3 |
|
common_voice_eo_25478612.mp3 |
|
common_voice_eo_25478623.mp3 |
|
common_voice_eo_25478631.mp3 |
|
common_voice_eo_25478756.mp3 |
|
common_voice_eo_25478762.mp3 |
|
common_voice_eo_25478768.mp3 |
|
common_voice_eo_25478769.mp3 |
|
common_voice_eo_25479150.mp3 |
|
common_voice_eo_25479203.mp3 |
|
common_voice_eo_25479229.mp3 |
|
common_voice_eo_25517673.mp3 |
|
common_voice_eo_25517677.mp3 |
|
common_voice_eo_25527739.mp3 |
|
``` |
|
|
|
Those files have no audio. |
|
|
|
### Alternative 2 |
|
|
|
Another possibility is just to go through the audio files and throw away any where the peak audio isn't above some threshold. |
|
|
|
### Alternative 3 |
|
|
|
Since this model seems to work well enough, I could run inference on all samples, and just discard the ones where the CER (as determined by this model) is too high, say above 0.5. Then use that to filter the examples and train another model. These high-CER examples are: |
|
|
|
#### Test set |
|
|
|
71 of 14913 examples in the test set show high CER. |
|
|
|
```txt |
|
common_voice_eo_25214319.mp3 |
|
common_voice_eo_25006596.mp3 |
|
common_voice_eo_27472721.mp3 |
|
common_voice_eo_27715088.mp3 |
|
common_voice_eo_27715091.mp3 |
|
common_voice_eo_26677019.mp3 |
|
common_voice_eo_26677023.mp3 |
|
common_voice_eo_20555291.mp3 |
|
common_voice_eo_25001942.mp3 |
|
common_voice_eo_25457354.mp3 |
|
common_voice_eo_25457355.mp3 |
|
common_voice_eo_25457365.mp3 |
|
common_voice_eo_25457373.mp3 |
|
common_voice_eo_25457396.mp3 |
|
common_voice_eo_25457397.mp3 |
|
common_voice_eo_25457409.mp3 |
|
common_voice_eo_25457410.mp3 |
|
common_voice_eo_25457412.mp3 |
|
common_voice_eo_25457442.mp3 |
|
common_voice_eo_25457444.mp3 |
|
common_voice_eo_25457445.mp3 |
|
common_voice_eo_25457577.mp3 |
|
common_voice_eo_25457578.mp3 |
|
common_voice_eo_28064453.mp3 |
|
common_voice_eo_25047803.mp3 |
|
common_voice_eo_25048418.mp3 |
|
common_voice_eo_25048419.mp3 |
|
common_voice_eo_25048421.mp3 |
|
common_voice_eo_25048423.mp3 |
|
common_voice_eo_25048428.mp3 |
|
common_voice_eo_25048574.mp3 |
|
common_voice_eo_25885643.mp3 |
|
common_voice_eo_25885645.mp3 |
|
common_voice_eo_26794882.mp3 |
|
common_voice_eo_27356529.mp3 |
|
common_voice_eo_25012640.mp3 |
|
common_voice_eo_25303457.mp3 |
|
common_voice_eo_18153931.mp3 |
|
common_voice_eo_18776206.mp3 |
|
common_voice_eo_18776208.mp3 |
|
common_voice_eo_18776219.mp3 |
|
common_voice_eo_18776220.mp3 |
|
common_voice_eo_18776222.mp3 |
|
common_voice_eo_18776223.mp3 |
|
common_voice_eo_18776236.mp3 |
|
common_voice_eo_18776238.mp3 |
|
common_voice_eo_18776244.mp3 |
|
common_voice_eo_18776248.mp3 |
|
common_voice_eo_18776285.mp3 |
|
common_voice_eo_18776287.mp3 |
|
common_voice_eo_18776297.mp3 |
|
common_voice_eo_18776298.mp3 |
|
common_voice_eo_25047998.mp3 |
|
common_voice_eo_25047999.mp3 |
|
common_voice_eo_25048000.mp3 |
|
common_voice_eo_25048001.mp3 |
|
common_voice_eo_25048002.mp3 |
|
common_voice_eo_25053113.mp3 |
|
common_voice_eo_25068355.mp3 |
|
common_voice_eo_25333056.mp3 |
|
common_voice_eo_25371639.mp3 |
|
common_voice_eo_25371640.mp3 |
|
common_voice_eo_25371641.mp3 |
|
common_voice_eo_25371642.mp3 |
|
common_voice_eo_25371643.mp3 |
|
common_voice_eo_22441946.mp3 |
|
common_voice_eo_26622121.mp3 |
|
common_voice_eo_25167318.mp3 |
|
common_voice_eo_25252685.mp3 |
|
common_voice_eo_25252698.mp3 |
|
common_voice_eo_25518636.mp3 |
|
``` |
|
|
|
Note on two of the examples: We know that _saluton kiel vi fartas_ ("Hello, how are you") and _atendu momenton_ ("Wait a moment") is a good start in learning Esperanto, but if that's not the text to record, you're not really helping. |
|
|
|
#### Validation set |
|
|
|
17 of 14909 examples in the test set show high CER. |
|
|
|
```txt |
|
common_voice_eo_25392669.mp3 |
|
common_voice_eo_25392674.mp3 |
|
common_voice_eo_25392675.mp3 |
|
common_voice_eo_25392676.mp3 |
|
common_voice_eo_25392678.mp3 |
|
common_voice_eo_25392693.mp3 |
|
common_voice_eo_25392694.mp3 |
|
common_voice_eo_25392695.mp3 |
|
common_voice_eo_25392697.mp3 |
|
common_voice_eo_25392701.mp3 |
|
common_voice_eo_25392702.mp3 |
|
common_voice_eo_25392708.mp3 |
|
common_voice_eo_25392709.mp3 |
|
common_voice_eo_25408881.mp3 |
|
common_voice_eo_25408882.mp3 |
|
common_voice_eo_25408885.mp3 |
|
common_voice_eo_27380623.mp3 |
|
``` |
|
|
|
I didn't include some which had high CER because of hallucinations during a one-word recording with lots of silence before and after. The recording itself is fine on these. |
|
|
|
#### Training set |
|
|
|
135 of 143984 examples yielded high CER. I removed some from this list that had high CER but sounded fine. |
|
|
|
```txt |
|
common_voice_eo_25365027.mp3 |
|
common_voice_eo_25365472.mp3 |
|
common_voice_eo_25365480.mp3 |
|
common_voice_eo_25365532.mp3 |
|
common_voice_eo_25365695.mp3 |
|
common_voice_eo_25365744.mp3 |
|
common_voice_eo_25365804.mp3 |
|
common_voice_eo_25365836.mp3 |
|
common_voice_eo_25365855.mp3 |
|
common_voice_eo_25372587.mp3 |
|
common_voice_eo_25401060.mp3 |
|
common_voice_eo_25430837.mp3 |
|
common_voice_eo_25444509.mp3 |
|
common_voice_eo_25240777.mp3 |
|
common_voice_eo_24942754.mp3 |
|
common_voice_eo_24942755.mp3 |
|
common_voice_eo_24990372.mp3 |
|
common_voice_eo_24990385.mp3 |
|
common_voice_eo_24990390.mp3 |
|
common_voice_eo_24990397.mp3 |
|
common_voice_eo_24990413.mp3 |
|
common_voice_eo_24990427.mp3 |
|
common_voice_eo_24990429.mp3 |
|
common_voice_eo_24990435.mp3 |
|
common_voice_eo_24990441.mp3 |
|
common_voice_eo_24990454.mp3 |
|
common_voice_eo_24990457.mp3 |
|
common_voice_eo_24990459.mp3 |
|
common_voice_eo_24990490.mp3 |
|
common_voice_eo_25529345.mp3 |
|
common_voice_eo_25648750.mp3 |
|
common_voice_eo_28670472.mp3 |
|
common_voice_eo_27931966.mp3 |
|
common_voice_eo_28252265.mp3 |
|
common_voice_eo_25454951.mp3 |
|
common_voice_eo_25927616.mp3 |
|
common_voice_eo_25153203.mp3 |
|
common_voice_eo_25238543.mp3 |
|
common_voice_eo_25284237.mp3 |
|
common_voice_eo_25460131.mp3 |
|
common_voice_eo_25460185.mp3 |
|
common_voice_eo_25460186.mp3 |
|
common_voice_eo_25460188.mp3 |
|
common_voice_eo_25460189.mp3 |
|
common_voice_eo_25446723.mp3 |
|
common_voice_eo_26025150.mp3 |
|
common_voice_eo_26640189.mp3 |
|
common_voice_eo_26888468.mp3 |
|
common_voice_eo_24844824.mp3 |
|
common_voice_eo_25022506.mp3 |
|
common_voice_eo_25022507.mp3 |
|
common_voice_eo_25022516.mp3 |
|
common_voice_eo_25032858.mp3 |
|
common_voice_eo_25032859.mp3 |
|
common_voice_eo_25032865.mp3 |
|
common_voice_eo_25243988.mp3 |
|
common_voice_eo_25244009.mp3 |
|
common_voice_eo_25266094.mp3 |
|
common_voice_eo_25266141.mp3 |
|
common_voice_eo_25285278.mp3 |
|
common_voice_eo_25286768.mp3 |
|
common_voice_eo_25457171.mp3 |
|
common_voice_eo_25467641.mp3 |
|
common_voice_eo_25467723.mp3 |
|
common_voice_eo_25467791.mp3 |
|
common_voice_eo_25467820.mp3 |
|
common_voice_eo_25467943.mp3 |
|
common_voice_eo_25478612.mp3 |
|
common_voice_eo_25478623.mp3 |
|
common_voice_eo_25478631.mp3 |
|
common_voice_eo_25478756.mp3 |
|
common_voice_eo_25478762.mp3 |
|
common_voice_eo_25478768.mp3 |
|
common_voice_eo_25478769.mp3 |
|
common_voice_eo_25479150.mp3 |
|
common_voice_eo_25479203.mp3 |
|
common_voice_eo_25479229.mp3 |
|
common_voice_eo_25517673.mp3 |
|
common_voice_eo_25517677.mp3 |
|
common_voice_eo_25527739.mp3 |
|
common_voice_eo_25975149.mp3 |
|
common_voice_eo_26193748.mp3 |
|
common_voice_eo_28401039.mp3 |
|
common_voice_eo_28421315.mp3 |
|
common_voice_eo_28937347.mp3 |
|
common_voice_eo_24890414.mp3 |
|
common_voice_eo_25294479.mp3 |
|
common_voice_eo_25438966.mp3 |
|
common_voice_eo_28855568.mp3 |
|
common_voice_eo_29011007.mp3 |
|
common_voice_eo_24599888.mp3 |
|
common_voice_eo_26964252.mp3 |
|
common_voice_eo_26964496.mp3 |
|
common_voice_eo_26964510.mp3 |
|
common_voice_eo_25432789.mp3 |
|
common_voice_eo_26688158.mp3 |
|
common_voice_eo_28516354.mp3 |
|
common_voice_eo_24790865.mp3 |
|
common_voice_eo_24790897.mp3 |
|
common_voice_eo_24790898.mp3 |
|
common_voice_eo_24790899.mp3 |
|
common_voice_eo_24790900.mp3 |
|
common_voice_eo_25362713.mp3 |
|
common_voice_eo_27585084.mp3 |
|
common_voice_eo_24813131.mp3 |
|
common_voice_eo_25035262.mp3 |
|
common_voice_eo_26000289.mp3 |
|
common_voice_eo_26003943.mp3 |
|
common_voice_eo_26283983.mp3 |
|
common_voice_eo_28708931.mp3 |
|
common_voice_eo_28037217.mp3 |
|
common_voice_eo_29273106.mp3 |
|
common_voice_eo_26006657.mp3 |
|
common_voice_eo_25399924.mp3 |
|
common_voice_eo_27982431.mp3 |
|
common_voice_eo_25893779.mp3 |
|
common_voice_eo_27842061.mp3 |
|
common_voice_eo_25052385.mp3 |
|
common_voice_eo_25807395.mp3 |
|
common_voice_eo_25807985.mp3 |
|
common_voice_eo_25808039.mp3 |
|
common_voice_eo_25808407.mp3 |
|
common_voice_eo_25809036.mp3 |
|
common_voice_eo_27487795.mp3 |
|
common_voice_eo_28460556.mp3 |
|
common_voice_eo_28884851.mp3 |
|
common_voice_eo_24819719.mp3 |
|
common_voice_eo_25153594.mp3 |
|
common_voice_eo_25234585.mp3 |
|
common_voice_eo_25245164.mp3 |
|
common_voice_eo_27538877.mp3 |
|
common_voice_eo_24862771.mp3 |
|
common_voice_eo_25070167.mp3 |
|
common_voice_eo_26381720.mp3 |
|
common_voice_eo_28110376.mp3 |
|
``` |
|
|
|
### Alternative 3.1 |
|
|
|
Of those files that have no or distorted audio, maybe change their target to be empty? Except for 'injabum'. |
|
|
|
### And also |
|
|
|
Since one can sign up at Common Voice to review Esperanto audio files, I've done so in the hopes of making a small contribution in quality. |
|
|