metadata

license: apache-2.0
base_model: openai/whisper-small
tags:
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: vi_whisper-small
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Vivos + Commonvoice
          type: vivos
          config: None
          split: None
        metrics:
          - name: Wer
            type: wer
            value: 21.8855

vi_whisper-small

This model is a fine-tuned version of openai/whisper-small on the Mixing of Vivos and CommonVoice dataset. It achieves the following results on the evaluation set:

Loss: 0.2894
Wer: 21.8855

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

In training phase i used VIVOS dataset and cleaned CommonVoice The VIVOS evaluation dataset was used

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
training_steps: 8000

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.249	1.1	1000	0.3766	32.1678
0.1416	2.2	2000	0.2881	46.4646
0.0839	3.3	3000	0.2799	22.7791
0.0546	4.41	4000	0.2894	21.8855
0.0256	5.51	5000	0.3023	32.2973
0.0111	6.61	6000	0.3061	31.0153
0.0028	7.71	7000	0.3143	27.1691
0.0014	8.81	8000	0.3187	27.3634

Framework versions

Transformers 4.31.0.dev0
Pytorch 2.0.1+cu117
Datasets 2.13.1
Tokenizers 0.13.3