pere's picture
Update README.md
3a42c8c
metadata
extra_gated_prompt: >-
  This is a BETA-model. To use this model, you agree on the [licensing
  terms](license.md).
language:
  - 'no'
license: apache-2.0
tags:
  - audio
  - asr
  - automatic-speech-recognition
  - hf-asr-leaderboard
model-index:
  - name: Small Scream - April Beta
    results: []

Small Scream - April Beta

This model is a fine-tuned version of openai/whisper-small on the NbAiLab/NCC_speech_all_v5 dataset. It uses a Beam size of 5. It achieves the following results on the evaluation set:

  • step: 49999
  • eval_loss: 0.5299
  • train_loss: 0.3369
  • eval_wer: 11.9976
  • eval_cer: 5.6236

Model description

This is a BETA version. You need to accept the terms and conditons to use it.

Using the Model

There are several ways of using this model, and we do hope people will convert it into different formats. The code below allows you to process long files with Transformers.:

import torch
import numpy as np
import librosa
from transformers import pipeline

# Try using "mps" for Metal (Mac), "cuda" if you have GPU, and "cpu" if not
device = torch.device("cuda")

pipe = pipeline("automatic-speech-recognition",
      model="NbAiLab/small_scream_april_beta",
      chunk_length_s=30,
      device=device,
      max_new_tokens=128,
      generate_kwargs={"language": "", "task": "transcribe"})

# Load the WAV file. Modify this to use mp3 instead
audio_path = 'myfile.wav'
samples, sample_rate = librosa.load(audio_path, sr=16000, mono=True)

# Run the pipeline
prediction = pipe(samples)["text"]

print(prediction)

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 6e-06
  • lr_scheduler_type: linear
  • per_device_train_batch_size: 16
  • total_train_batch_size_per_node: 64
  • total_train_batch_size: 64
  • total_optimization_steps: 50000
  • starting_optimization_step: None
  • finishing_optimization_step: 50000
  • num_train_dataset_workers: 32
  • total_num_training_examples: 3200000

Training results

step eval_loss train_loss eval_wer eval_cer
0 1.5034 1.6162 32.7040 11.9022
0 1.5034 1.6129 32.7040 11.9022
0 1.5034 1.5855 32.7040 11.9022
2500 0.9684 0.6679 21.7113 8.0826
5000 0.8986 0.6577 18.6358 7.3167
7500 0.7365 0.4619 16.3825 6.8027
10000 0.6429 0.4965 14.7990 6.2887
12500 0.6688 0.4602 13.7942 6.0217
15000 0.6509 0.4650 13.3069 5.9965
17500 0.5692 0.3979 12.8502 5.6790
20000 0.5530 0.3931 13.0938 5.8554
22500 0.5320 0.4441 12.5457 5.7596
25000 0.5109 0.4116 12.7893 5.8503
27500 0.4855 0.3728 12.9111 5.8856
30000 0.4720 0.3842 12.6066 5.8201
32500 0.4889 0.3051 12.4239 5.7244
35000 0.5312 0.3388 12.6066 5.9259
37500 0.5138 0.3409 12.3934 5.7999
40000 0.5214 0.2886 11.9367 5.5530
42500 0.5420 0.3431 12.6675 5.9914
45000 0.5263 0.4015 12.3934 5.9360
47500 0.5378 0.3218 12.1194 5.6185
49999 0.5299 0.3369 11.9976 5.6236

Framework versions

  • Transformers 4.28.0.dev0
  • Datasets 2.11.0
  • Tokenizers 0.13.3