File size: 5,306 Bytes

---
language:
- el
license: apache-2.0
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
model-index:
- name: whisper-sm-el-xs
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: mozilla-foundation/common_voice_11_0 el
      type: mozilla-foundation/common_voice_11_0
      config: el
      split: test
      args: el
    metrics:
    - name: Wer
      type: wer
      value: 20.63521545319465
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Whisper-Small (el) for Transcription

This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the mozilla-foundation/common_voice_11_0 el dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4805
- Wer: 20.6352

## Model description

This model is trained for transcription on the Greek subset on mozilla-foundation/common_voice_11_0 interleaved splits train+eval

## Intended uses & limitations

This is part of the Whisper Finetuning Event (December 2022)

## Training and evaluation data

Training used interleaved splits: train + evaluation. 
Evaluation was done on the test split.
Data was streamed from Hugging Face's Hub.

## Training procedure

The script used has been uploaded in the files of this space
The command to run it was:
```
python ./run_speech_recognition_seq2seq_streaming.py \
                --model_name_or_path   "openai/whisper-small" \
                --model_revision       "main" \
                --do_train             True \
                --do_eval              True \
                --use_auth_token       False \
                --freeze_encoder       False \
                --model_index_name     "whisper-sm-el-xs" \
                --dataset_name         "mozilla-foundation/common_voice_11_0" \
                --dataset_config_name  "el" \
                --audio_column_name    "audio" \
                --text_column_name     "sentence" \
                --max_duration_in_seconds 30 \
                --train_split_name    "train+validation" \
                --eval_split_name      "test" \
                --do_lower_case         False \
                --do_remove_punctuation False \
                --do_normalize_eval     True \
                --language              "greek" \
                --task                  "transcribe" \
                --shuffle_buffer_size   500 \
                --output_dir             "./data/finetuningRuns/whisper-sm-el-xs" \
                --per_device_train_batch_size 16 \
                --gradient_accumulation_steps 4  \
                --learning_rate          1e-5 \
                --warmup_steps           500 \
                --max_steps              5000 \
                --gradient_checkpointing True \
                --fp16                   True \
                --evaluation_strategy    "steps" \
                --per_device_eval_batch_size 8 \
                --predict_with_generate  True \
                --generation_max_length  225 \
                --save_steps             1000 \
                --eval_steps             1000 \
                --logging_steps          25 \
                --report_to              "tensorboard" \
                --load_best_model_at_end True \
                --metric_for_best_model  "wer" \
                --greater_is_better      False \
                --push_to_hub            False \
                --overwrite_output_dir    True 
```
### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 5000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer     |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| 0.0024        | 18.01 | 1000 | 0.4246          | 21.0438 |
| 0.0003        | 37.01 | 2000 | 0.4805          | 20.6352 |
| 0.0001        | 56.01 | 3000 | 0.5102          | 20.8395 |
| 0.0001        | 75.0  | 4000 | 0.5296          | 21.0717 |
| 0.0001        | 94.0  | 5000 | 0.5375          | 21.0253 |

Here is the summary from the log of the run:

```
***** train metrics *****
  epoch                    =        94.0
  train_loss               =      0.0222
  train_runtime            = 23:06:13.19
  train_samples_per_second =       3.847
  train_steps_per_second   =        0.06
12/08/2022 11:20:17 - INFO - __main__ - *** Evaluate ***

***** eval metrics *****
  epoch                   =       94.0
  eval_loss               =     0.4805
  eval_runtime            = 0:23:03.68
  eval_samples_per_second =      1.226
  eval_steps_per_second   =      0.153
  eval_wer                =    20.6352
Thu 08 Dec 2022 11:43:22 AM EST
```

### Framework versions

- Transformers 4.26.0.dev0
- Pytorch 1.13.0
- Datasets 2.7.1.dev0
- Tokenizers 0.12.1