File size: 5,306 Bytes
a856454 52ed255 a856454 52ed255 a856454 52ed255 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
language:
- el
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
model-index:
- name: whisper-sm-el-xs
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: mozilla-foundation/common_voice_11_0 el
type: mozilla-foundation/common_voice_11_0
config: el
split: test
args: el
metrics:
- name: Wer
type: wer
value: 20.63521545319465
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Whisper-Small (el) for Transcription
This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the mozilla-foundation/common_voice_11_0 el dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4805
- Wer: 20.6352
## Model description
This model is trained for transcription on the Greek subset on mozilla-foundation/common_voice_11_0 interleaved splits train+eval
## Intended uses & limitations
This is part of the Whisper Finetuning Event (December 2022)
## Training and evaluation data
Training used interleaved splits: train + evaluation.
Evaluation was done on the test split.
Data was streamed from Hugging Face's Hub.
## Training procedure
The script used has been uploaded in the files of this space
The command to run it was:
```
python ./run_speech_recognition_seq2seq_streaming.py \
--model_name_or_path "openai/whisper-small" \
--model_revision "main" \
--do_train True \
--do_eval True \
--use_auth_token False \
--freeze_encoder False \
--model_index_name "whisper-sm-el-xs" \
--dataset_name "mozilla-foundation/common_voice_11_0" \
--dataset_config_name "el" \
--audio_column_name "audio" \
--text_column_name "sentence" \
--max_duration_in_seconds 30 \
--train_split_name "train+validation" \
--eval_split_name "test" \
--do_lower_case False \
--do_remove_punctuation False \
--do_normalize_eval True \
--language "greek" \
--task "transcribe" \
--shuffle_buffer_size 500 \
--output_dir "./data/finetuningRuns/whisper-sm-el-xs" \
--per_device_train_batch_size 16 \
--gradient_accumulation_steps 4 \
--learning_rate 1e-5 \
--warmup_steps 500 \
--max_steps 5000 \
--gradient_checkpointing True \
--fp16 True \
--evaluation_strategy "steps" \
--per_device_eval_batch_size 8 \
--predict_with_generate True \
--generation_max_length 225 \
--save_steps 1000 \
--eval_steps 1000 \
--logging_steps 25 \
--report_to "tensorboard" \
--load_best_model_at_end True \
--metric_for_best_model "wer" \
--greater_is_better False \
--push_to_hub False \
--overwrite_output_dir True
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 5000
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| 0.0024 | 18.01 | 1000 | 0.4246 | 21.0438 |
| 0.0003 | 37.01 | 2000 | 0.4805 | 20.6352 |
| 0.0001 | 56.01 | 3000 | 0.5102 | 20.8395 |
| 0.0001 | 75.0 | 4000 | 0.5296 | 21.0717 |
| 0.0001 | 94.0 | 5000 | 0.5375 | 21.0253 |
Here is the summary from the log of the run:
```
***** train metrics *****
epoch = 94.0
train_loss = 0.0222
train_runtime = 23:06:13.19
train_samples_per_second = 3.847
train_steps_per_second = 0.06
12/08/2022 11:20:17 - INFO - __main__ - *** Evaluate ***
***** eval metrics *****
epoch = 94.0
eval_loss = 0.4805
eval_runtime = 0:23:03.68
eval_samples_per_second = 1.226
eval_steps_per_second = 0.153
eval_wer = 20.6352
Thu 08 Dec 2022 11:43:22 AM EST
```
### Framework versions
- Transformers 4.26.0.dev0
- Pytorch 1.13.0
- Datasets 2.7.1.dev0
- Tokenizers 0.12.1
|