pere's picture
Update README.md
23b756f
metadata
extra_gated_prompt: >-
  This is a BETA-model. To use this model, you agree on the [licensing
  terms](license.md).
language:
  - 'no'
license: apache-2.0
tags:
  - audio
  - asr
  - automatic-speech-recognition
  - hf-asr-leaderboard
model-index:
  - name: tiny_scream_april_beta
    results: []

tiny_scream_april_beta

This model is a fine-tuned version of openai/whisper-tiny on the NbAiLab/NCC_speech_all_v5 dataset. It uses a beam size of 5.

Model description

This is a BETA version. You need to accept the terms and conditons to use it.

Using the Model

There are several ways of using this model, and we do hope people will convert it into different formats. The code below allows you to process long files with Transformers.:

import torch
import numpy as np
import librosa
from transformers import pipeline

# Try using "mps" for Metal (Mac), "cuda" if you have GPU, and "cpu" if not
device = torch.device("cuda")

pipe = pipeline("automatic-speech-recognition",
      model="NbAiLab/tiny_scream_april_beta",
      chunk_length_s=30,
      device=device,
      max_new_tokens=128,
      generate_kwargs={"language": "", "task": "transcribe"})

# Load the WAV file. Modify this to use mp3 instead
audio_path = 'myfile.wav'
samples, sample_rate = librosa.load(audio_path, sr=16000, mono=True)

# Run the pipeline
prediction = pipe(samples)["text"]

print(prediction)

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • lr_scheduler_type: linear
  • per_device_train_batch_size: 48
  • total_train_batch_size_per_node: 192
  • total_train_batch_size: 1536
  • total_optimization_steps: 50000
  • starting_optimization_step: None
  • finishing_optimization_step: 50000
  • num_train_dataset_workers: 64
  • total_num_training_examples: 76800000

Training results

step eval_loss train_loss eval_wer eval_cer
0 2.1853 2.6128 225.2741 151.0305
2500 0.8090 0.6776 26.0049 10.4006
5000 0.5674 0.5277 20.7674 8.7327
7500 0.5255 0.4551 19.3971 8.5059
10000 0.5774 0.4327 18.0877 8.0272

Framework versions

  • Transformers 4.28.0.dev0
  • Datasets 2.11.0
  • Tokenizers 0.13.2