jonatasgrosman commited on
Commit
2ac91b5
1 Parent(s): a9b9a26

add evaluation

Browse files
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - fr
4
+ license: apache-2.0
5
+ tags:
6
+ - whisper-event
7
+ - generated_from_trainer
8
+ datasets:
9
+ - mozilla-foundation/common_voice_11_0
10
+ metrics:
11
+ - wer
12
+ - cer
13
+ model-index:
14
+ - name: Whisper Large French
15
+ results:
16
+ - task:
17
+ name: Automatic Speech Recognition
18
+ type: automatic-speech-recognition
19
+ dataset:
20
+ name: mozilla-foundation/common_voice_11_0 fr
21
+ type: mozilla-foundation/common_voice_11_0
22
+ config: fr
23
+ split: test
24
+ args: fr
25
+ metrics:
26
+ - name: WER
27
+ type: wer
28
+ value: 9.086701085988962
29
+ - name: CER
30
+ type: cer
31
+ value: 3.327312134958326
32
+ - task:
33
+ name: Automatic Speech Recognition
34
+ type: automatic-speech-recognition
35
+ dataset:
36
+ name: google/fleurs fr_fr
37
+ type: google/fleurs
38
+ config: fr_fr
39
+ split: test
40
+ args: fr_fr
41
+ metrics:
42
+ - name: WER
43
+ type: wer
44
+ value: 8.6863088842391
45
+ - name: CER
46
+ type: cer
47
+ value: 5.089870653452041
48
+ ---
49
+
50
+ # Whisper Large French
51
+
52
+ This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on French using the train split of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0).
53
+
54
+ ## Usage
55
+
56
+ ```python
57
+
58
+ from transformers import pipeline
59
+
60
+ transcriber = pipeline(
61
+ "automatic-speech-recognition",
62
+ model="jonatasgrosman/whisper-large-fr-cv11"
63
+ )
64
+
65
+ transcriber.model.config.forced_decoder_ids = (
66
+ transcriber.tokenizer.get_decoder_prompt_ids(
67
+ language="fr",
68
+ task="transcribe"
69
+ )
70
+ )
71
+
72
+ transcription = transcriber("path/to/my_audio.wav")
73
+
74
+ ```
75
+
76
+ ## Evaluation
77
+
78
+ I've performed the evaluation of the model using the test split of two datasets, the [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (same dataset used for the fine-tuning) and the [Fleurs](https://huggingface.co/datasets/google/fleurs) (dataset not seen during the fine-tuning). As Whisper can transcribe casing and punctuation, I've performed the model evaluation in 2 different scenarios, one using the raw text and the other using the normalized text (lowercase + removal of punctuations). Additionally, for the Fleurs dataset, I've evaluated the model in a scenario where there are no transcriptions of numerical values since the way these values are described in this dataset is different from how they are described in the dataset used in fine-tuning (Common Voice), so it is expected that this difference in the way of describing numerical values will affect the performance of the model for this type of transcription in Fleurs.
79
+
80
+ ### Common Voice 11
81
+
82
+ | | CER | WER |
83
+ | --- | --- | --- |
84
+ | [jonatasgrosman/whisper-large-fr-cv11](https://huggingface.co/jonatasgrosman/whisper-large-fr-cv11) | 4.31 | 13.66 |
85
+ | [jonatasgrosman/whisper-large-fr-cv11](https://huggingface.co/jonatasgrosman/whisper-large-fr-cv11) + text normalization | 3.33 | 9.09 |
86
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 7.17 | 18.99 |
87
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 5.74 | 12.82 |
88
+
89
+
90
+ ### Fleurs
91
+
92
+ | | CER | WER |
93
+ | --- | --- | --- |
94
+ | [jonatasgrosman/whisper-large-fr-cv11](https://huggingface.co/jonatasgrosman/whisper-large-fr-cv11) | 4.96 | 14.24 |
95
+ | [jonatasgrosman/whisper-large-fr-cv11](https://huggingface.co/jonatasgrosman/whisper-large-fr-cv11) + text normalization | 5.09 | 8.69 |
96
+ | [jonatasgrosman/whisper-large-fr-cv11](https://huggingface.co/jonatasgrosman/whisper-large-fr-cv11) + keep only non-numeric samples | 3.14 | 12.10 |
97
+ | [jonatasgrosman/whisper-large-fr-cv11](https://huggingface.co/jonatasgrosman/whisper-large-fr-cv11) + text normalization + keep only non-numeric samples | 3.60 | 6.94 |
98
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 3.55 | 12.81 |
99
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 3.76 | 7.59 |
100
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + keep only non-numeric samples | 3.12 | 11.24 |
101
+ | [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization + keep only non-numeric samples | 3.65 | 6.99 |
evaluation_cv11_test.json ADDED
The diff for this file is too large to render. See raw diff
 
evaluation_fleurs_test.json ADDED
The diff for this file is too large to render. See raw diff
 
evaluation_whisper-large-v2_cv11_test.json ADDED
The diff for this file is too large to render. See raw diff
 
evaluation_whisper-large-v2_fleurs_test.json ADDED
The diff for this file is too large to render. See raw diff