Edit model card

Whisper Medium GA-EN Speech Translation Raw

This model is a fine-tuned version of openai/whisper-medium on the IWSLT-2023, FLEURS, BiteSize, and SpokenWords dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5187
  • Bleu: 26.56
  • Chrf: 46.91
  • Wer: 76.6772

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • training_steps: 4000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Bleu Chrf Validation Loss Wer
2.5874 0.0539 100 4.9 19.49 2.1785 114.0027
2.3237 0.1079 200 6.48 22.77 2.1129 151.8235
2.192 0.1618 300 7.92 25.9 2.0182 148.6718
1.9861 0.2157 400 10.55 28.55 1.8607 121.0266
1.8893 0.2697 500 16.68 33.64 1.8560 89.7794
1.8526 0.3236 600 8.83 30.12 1.7738 166.9968
1.6537 0.3776 700 10.94 33.83 1.6781 152.2287
1.7103 0.4315 800 16.9 36.4 1.6389 92.2557
1.4837 0.4854 900 13.81 34.5 1.6077 124.2233
1.2784 0.5394 1000 14.79 37.53 1.6103 116.3440
1.111 0.5933 1100 19.31 39.0 1.5579 93.6965
1.167 0.6472 1200 20.88 41.7 1.5210 91.6704
1.2217 0.7012 1300 21.29 41.72 1.4719 84.9167
1.0613 0.7551 1400 28.3 44.37 1.4663 67.1319
0.9256 0.8091 1500 27.5 45.59 1.4258 68.7078
0.8023 0.8630 1600 27.1 46.27 1.4027 72.7600
0.8327 0.9169 1700 27.03 46.19 1.3784 73.0302
0.7019 0.9709 1800 28.91 46.34 1.4127 67.4921
0.2681 1.0248 1900 28.53 47.12 1.3955 68.3026
0.2659 1.0787 2000 28.37 45.85 1.4194 68.1225
0.4202 1.1327 2100 1.5449 27.53 44.0 69.8784
0.4212 1.1866 2200 1.6060 25.89 43.05 70.1036
0.4124 1.2406 2300 1.6167 24.31 41.55 75.8217
0.4696 1.2945 2400 1.5904 21.79 41.86 85.0968
0.4018 1.3484 2500 1.6300 25.36 43.45 76.4070
0.4549 1.4024 2600 1.5540 26.06 44.27 71.9946
0.4018 1.4563 2700 1.5721 26.22 45.42 72.9851
0.3534 1.5102 2800 1.5488 23.65 44.43 80.0090
0.2907 1.5642 2900 1.5494 24.04 42.57 75.3715
0.3117 1.6181 3000 1.5691 28.27 45.06 67.2670
0.3379 1.6721 3100 1.4951 30.52 47.42 65.5561
0.3686 1.7260 3200 1.5010 30.7 48.13 64.8357
0.2855 1.7799 3300 1.5197 27.19 46.47 74.5610
0.2919 1.8339 3400 1.4974 31.39 48.56 63.5299
0.2582 1.8878 3500 1.4779 30.18 48.54 64.9257
0.2523 1.9417 3600 1.4835 30.29 47.07 66.6367
0.2005 1.9957 3700 1.4682 29.89 47.95 68.2125
0.0617 2.0496 3800 1.5221 29.49 47.1 67.6272
0.0661 2.1036 3900 1.5142 26.93 46.91 75.8217
0.0609 2.1575 4000 1.5187 26.56 46.91 76.6772

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
764M params
Tensor type
F32
·
Inference API
or
This model can be loaded on Inference API (serverless).

Finetuned from

Datasets used to train ymoslem/whisper-medium-ga2en-v1.3.1-4k-r

Evaluation results

  • Bleu on IWSLT-2023, FLEURS, BiteSize, and SpokenWords
    self-reported
    26.560
  • Wer on IWSLT-2023, FLEURS, BiteSize, and SpokenWords
    self-reported
    76.677