metadata

language:
  - ar
license: apache-2.0
base_model: openai/whisper-base
tags:
  - whisper
  - Arabic
  - AR
  - speech to text
  - stt
  - transcription
datasets:
  - mozilla-foundation/common_voice_16_0
  - BelalElhossany/mgb2_audios_transcriptions_non_overlap
  - nadsoft/Jordan-Audio
metrics:
  - wer
model-index:
  - name: Whisper base arabic
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        metrics:
          - name: Wer
            type: wer
            value: 34.7

Whisper base arabic

It achieves the following results on the evaluation set:

Loss: 0.44
Wer: 34.7

Training and evaluation data

Train set:

mozilla-foundation/common_voice_16_0 ar [train+validation]
BelalElhossany/mgb2_audios_transcriptions_non_overlap
nadsoft/Jordan-Audio

cross validation set: 600 samples in total from the 3 sets to save time during training as colab free tier was used to train the model. note: evaluate accuracy in the way you see fit.

Training procedure

removed arabic (حركات) from the texts. trained the model on the combined dataset for 6 epochs, the best one being the fifth so the model is basically the 5th epoch.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 32
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 1
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.4603	1	1437	0.4931	45.8857
0.2867	2	2874	0.4493	36.9973
0.2494	3	4311	0.4219	43.5553
0.1435	4	5748	0.4408	40.2351
0.1345	5	7185	0.4407	34.7081