jlondonobo
commited on
Commit
•
a4e3282
1
Parent(s):
7982bb5
Update README.md
Browse files
README.md
CHANGED
@@ -27,54 +27,51 @@ model-index:
|
|
27 |
value: 5.590020342630419
|
28 |
---
|
29 |
|
30 |
-
|
31 |
-
should probably proofread and complete it, then remove this comment. -->
|
32 |
|
33 |
-
|
34 |
|
35 |
-
|
36 |
-
It achieves the following results on the evaluation set:
|
37 |
-
- Loss: 0.2821
|
38 |
-
- Wer: 5.5900
|
39 |
|
40 |
-
|
|
|
41 |
|
42 |
-
|
43 |
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
-
More information needed
|
47 |
-
|
48 |
-
## Training and evaluation data
|
49 |
-
|
50 |
-
More information needed
|
51 |
-
|
52 |
-
## Training procedure
|
53 |
|
54 |
### Training hyperparameters
|
55 |
-
|
56 |
-
|
57 |
-
-
|
58 |
-
-
|
59 |
-
-
|
60 |
-
-
|
61 |
-
-
|
62 |
-
-
|
63 |
-
-
|
64 |
-
-
|
65 |
-
-
|
66 |
-
-
|
67 |
-
- mixed_precision_training: Native AMP
|
68 |
|
69 |
### Training results
|
70 |
|
71 |
| Training Loss | Epoch | Step | Validation Loss | Wer |
|
72 |
|:-------------:|:-----:|:----:|:---------------:|:------:|
|
73 |
-
| 0.0828 | 1.09 | 1000 | 0.1868 | 6.
|
74 |
-
| 0.0241 | 3.07 | 2000 | 0.2057 | 6.
|
75 |
-
| 0.0084 | 5.06 | 3000 | 0.2367 | 6.
|
76 |
-
| 0.0015 | 7.04 | 4000 | 0.2469 | 5.
|
77 |
-
| 0.0009 | 9.02 | 5000 | 0.2821 | 5.
|
78 |
|
79 |
|
80 |
### Framework versions
|
|
|
27 |
value: 5.590020342630419
|
28 |
---
|
29 |
|
30 |
+
# Whisper Large V2 Portuguese 🇧🇷🇵🇹
|
|
|
31 |
|
32 |
+
Bem-vindo ao **whisper large-v2** para transcrição em português 👋🏻
|
33 |
|
34 |
+
Transcribe Portuguese audio to text with the highest precision.
|
|
|
|
|
|
|
35 |
|
36 |
+
- Loss: 0.282
|
37 |
+
- Wer: 5.590
|
38 |
|
39 |
+
This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the [mozilla-foundation/common_voice_11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) dataset. If you want a lighter model, you may be interested in [jlondonobo/whisper-medium-pt](https://huggingface.co/jlondonobo/whisper-medium-pt). It achieves faster inference with almost no difference in WER.
|
40 |
|
41 |
+
### Comparable models
|
42 |
+
Reported **WER** is based on the evaluation subset of Common Voice.
|
43 |
+
| Model | WER | # Parameters |
|
44 |
+
|--------------------------------------------------|:--------:|:------------:|
|
45 |
+
| [jlondonobo/whisper-large-v2-pt](https://huggingface.co/jlondonobo/whisper-large-v2-pt) | **5.590** 🤗 | 1550M |
|
46 |
+
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 6.300 | 1550M |
|
47 |
+
| [jlondonobo/whisper-medium-pt](https://huggingface.co/jlondonobo/whisper-medium-pt) | 6.579 | 769M |
|
48 |
+
| [jonatasgrosman/wav2vec2-large-xlsr-53-portuguese](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-portuguese) | 11.310 | 317M |
|
49 |
+
| [Edresson/wav2vec2-large-xlsr-coraa-portuguese](https://huggingface.co/Edresson/wav2vec2-large-xlsr-coraa-portuguese) | 20.080 | 317M |
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
### Training hyperparameters
|
53 |
+
We used the following hyperparameters for training:
|
54 |
+
- `learning_rate`: 1e-05
|
55 |
+
- `train_batch_size`: 16
|
56 |
+
- `eval_batch_size`: 8
|
57 |
+
- `seed`: 42
|
58 |
+
- `gradient_accumulation_steps`: 2
|
59 |
+
- `total_train_batch_size`: 32
|
60 |
+
- `optimizer`: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
61 |
+
- `lr_scheduler_type`: linear
|
62 |
+
- `lr_scheduler_warmup_steps`: 500
|
63 |
+
- `training_steps`: 5000
|
64 |
+
- `mixed_precision_training`: Native AMP
|
|
|
65 |
|
66 |
### Training results
|
67 |
|
68 |
| Training Loss | Epoch | Step | Validation Loss | Wer |
|
69 |
|:-------------:|:-----:|:----:|:---------------:|:------:|
|
70 |
+
| 0.0828 | 1.09 | 1000 | 0.1868 | 6.778 |
|
71 |
+
| 0.0241 | 3.07 | 2000 | 0.2057 | 6.109 |
|
72 |
+
| 0.0084 | 5.06 | 3000 | 0.2367 | 6.029 |
|
73 |
+
| 0.0015 | 7.04 | 4000 | 0.2469 | 5.709 |
|
74 |
+
| 0.0009 | 9.02 | 5000 | 0.2821 | 5.590 🤗|
|
75 |
|
76 |
|
77 |
### Framework versions
|