pere's picture
Saving weights and logs of step 29999 - epoch 3
1073de0
metadata
language:
  - 'no'
license: apache-2.0
tags:
  - audio
  - asr
  - automatic-speech-recognition
  - hf-asr-leaderboard
model-index:
  - name: scream_non_large_3e06_beams5_v2
    results: []

scream_non_large_3e06_beams5_v2

This model is a fine-tuned version of openai/whisper-large-v2 on the NbAiLab/NCC_speech_all_v5 dataset. It achieves the following results on the evaluation set:

  • step: 29999
  • eval_loss: 1.2308
  • train_loss: 0.2858
  • eval_wer: 10.3228
  • eval_cer: 5.6236

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-06
  • lr_scheduler_type: linear
  • per_device_train_batch_size: 5
  • total_train_batch_size_per_node: 20
  • total_train_batch_size: 160
  • total_optimization_steps: 30,000
  • starting_optimization_step: None
  • finishing_optimization_step: 30,000
  • num_train_dataset_workers: 32
  • numb_hosts: 8
  • total_num_training_examples: 4,800,000
  • steps_per_epoch: 8543
  • num_beams: 5

Training results

step eval_loss train_loss eval_wer eval_cer
0 2.0006 0.8864 15.7125 8.0675
5000 1.4781 0.3285 10.3532 5.0340
10000 1.2340 0.3277 10.8709 5.7596
15000 1.1556 0.3272 10.3837 5.3515
20000 1.2220 0.2775 10.2314 5.5026
25000 1.2454 0.2367 10.1705 5.4976
29999 1.2308 0.2858 10.3228 5.6236

Framework versions

  • Transformers 4.29.0.dev0
  • Datasets 2.11.0
  • Tokenizers 0.13.3