xekri
/

wav2vec2-common_voice_13_0-eo-3

@@ -231,16 +231,16 @@ The following hyperparameters were used during training:
 While debugging other training sessions where more data from the Esperanto Common Voice dataset was used -- some loss calculations were returning either `inf` or `nan` -- I found that some of the training set trained with this model had surprisingly high CER. Some examples:
-| file | Actual<br>Predicted | CER | Comment |
 |:-----|:--------------------|:----|:--------|
-|common_voice_eo_25365027.mp3 | en la hansaj agentejoj komercistoj el la regiono renkontis kolegojn el aliaj regionoj<br>a taaj keo eoj eejn kigos eegoj  eioeegiooj| 0.61 | No audio |
-|common_voice_eo_25365472.mp3 | ili vendas armilojn kaj teknologiojn al la fanatikuloj por gajni monon monon monon<br>ila mamato aiil ajn kno ion a a aotigojn pu aiooo aj knon | 0.55 | Barely any audio, distorted |
-|common_voice_eo_25365836.mp3 | industria apliko estas la kreado de modifitaj bakterioj kiuj produktas deziratan kemian substancon<br>iiti sieetas la eeadooddddooiooaotooeioj aiicenon | 0.67 | Barely any audio, distorted |
-|2600 | ili akiras plenkreskan plumaron nur en la kvina jaro<br>ili aaros peetaj patato a a sia ro | 0.52 | It's literally someone saying 'injabum'. Thanks, troll. |
-|7333 | poste sekvas difinoj de la termino<br>po | 0.94 | No audio |
-|7334 | li gvidis multajn kursojn laŭ la csehmetodo<br>po | 0.98 | No audio |
-|7429 | tamen pro la rekonstruo de kluzoj ne eblas trapasi komplete<br>po | 0.97 | No audio |
-|11662 | lingvotesto estas postulata ekzemple por akceptiĝo en anglalingvaj altlernejoj<br>linkonteto estastitot etateerteito en pootaeaje lgijoj | 0.58 | No audio |
 Some examples have no audio. All of these files in the dataset are completely useless, and should be removed from the training set.
@@ -270,7 +270,7 @@ By running `run_speech_recognition_ctc` with `do_train=false`, setting `model_na
             metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item()
 ```
-Doing this shows that of the 14913 examples in `test`, the following file results in `inf` loss:
 `common_voice_eo_25167318.mp3`
@@ -278,7 +278,7 @@ The audio on this is severly garbled. This should absolutely be filtered out of
 No `validation` samples result in `inf` or `nan`.
-The following files out of the 143984 examples in `train` result in `inf` loss:
 ```txt
 common_voice_eo_25467641.mp3
@@ -313,6 +313,8 @@ Since this model seems to work well enough, I could run inference on all samples
 #### Test set
 ```txt
 common_voice_eo_25214319.mp3
 common_voice_eo_25006596.mp3
@@ -387,11 +389,12 @@ common_voice_eo_25252698.mp3
 common_voice_eo_25518636.mp3
 ```
-Note on `test[100]` and `test[101]`: We know that `saluton kiel vi fartas` and `atendu momenton` is a good start, but if that's not the text to record, you're not really helping.
 #### Validation set
-141 of
 ```txt
 common_voice_eo_25392669.mp3
 common_voice_eo_25392674.mp3
@@ -414,7 +417,6 @@ common_voice_eo_27380623.mp3
 I didn't include some which had high CER because of hallucinations during a one-word recording with lots of silence before and after. The recording itself is fine on these.
 #### Training set
 135 of 143984 examples yielded high CER. I removed some from this list that had high CER but sounded fine.

 While debugging other training sessions where more data from the Esperanto Common Voice dataset was used -- some loss calculations were returning either `inf` or `nan` -- I found that some of the training set trained with this model had surprisingly high CER. Some examples:
+| file | Actual<br>---<br>Predicted | CER | Comment |
 |:-----|:--------------------|:----|:--------|
+|common_voice_eo_25365027.mp3 | en la hansaj agentejoj komercistoj el la regiono renkontis kolegojn el aliaj regionoj<br>---<br>a taaj keo eoj eejn kigos eegoj  eioeegiooj| 0.61 | No audio |
+|common_voice_eo_25365472.mp3 | ili vendas armilojn kaj teknologiojn al la fanatikuloj por gajni monon monon monon<br>---<br>ila mamato aiil ajn kno ion a a aotigojn pu aiooo aj knon | 0.55 | Barely any audio, distorted |
+|common_voice_eo_25365836.mp3 | industria apliko estas la kreado de modifitaj bakterioj kiuj produktas deziratan kemian substancon<br>---<br>iiti sieetas la eeadooddddooiooaotooeioj aiicenon | 0.67 | Barely any audio, distorted |
+|2600 | ili akiras plenkreskan plumaron nur en la kvina jaro<br>---<br>ili aaros peetaj patato a a sia ro | 0.52 | It's literally someone saying 'injabum'. Thanks, troll. |
+|7333 | poste sekvas difinoj de la termino<br>---<br>po | 0.94 | No audio |
+|7334 | li gvidis multajn kursojn laŭ la csehmetodo<br>---<br>po | 0.98 | No audio |
+|7429 | tamen pro la rekonstruo de kluzoj ne eblas trapasi komplete<br>---<br>po | 0.97 | No audio |
+|11662 | lingvotesto estas postulata ekzemple por akceptiĝo en anglalingvaj altlernejoj<br>---<br>linkonteto estastitot etateerteito en pootaeaje lgijoj | 0.58 | No audio |
 Some examples have no audio. All of these files in the dataset are completely useless, and should be removed from the training set.
             metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item()
 ```
+Doing this shows that of the 14913 examples in `test`, the following example results in `inf` loss:
 `common_voice_eo_25167318.mp3`
 No `validation` samples result in `inf` or `nan`.
+The following 18 out of 143984 examples in `train` result in `inf` loss:
 ```txt
 common_voice_eo_25467641.mp3
 #### Test set
+71 of 14913 examples in the test set show high CER.
 ```txt
 common_voice_eo_25214319.mp3
 common_voice_eo_25006596.mp3
 common_voice_eo_25518636.mp3
 ```
+Note on two of the examples: We know that _saluton kiel vi fartas_ ("Hello, how are you") and _atendu momenton_ ("Wait a moment") is a good start in learning Esperanto, but if that's not the text to record, you're not really helping.
 #### Validation set
+17 of 14909 examples in the test set show high CER.
 ```txt
 common_voice_eo_25392669.mp3
 common_voice_eo_25392674.mp3
 I didn't include some which had high CER because of hallucinations during a one-word recording with lots of silence before and after. The recording itself is fine on these.
 #### Training set
 135 of 143984 examples yielded high CER. I removed some from this list that had high CER but sounded fine.