--- extra_gated_prompt: "This is a BETA-model. To use this model, you agree on the [licensing terms](license.md)." language: - 'no' license: apache-2.0 tags: - audio - asr - automatic-speech-recognition - hf-asr-leaderboard model-index: - name: tiny_scream_april_beta results: [] --- # tiny_scream_april_beta This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the NbAiLab/NCC_speech_all_v5 dataset. It uses a beam size of 5. ## Model description This is a BETA version. You need to accept [the terms and conditons](license.md) to use it. ## Using the Model There are several ways of using this model, and we do hope people will convert it into different formats. The code below allows you to process long files with Transformers.: ```python import torch import numpy as np import librosa from transformers import pipeline # Try using "mps" for Metal (Mac), "cuda" if you have GPU, and "cpu" if not device = torch.device("cuda") pipe = pipeline("automatic-speech-recognition", model="NbAiLab/tiny_scream_april_beta", chunk_length_s=30, device=device, max_new_tokens=128, generate_kwargs={"language": "", "task": "transcribe"}) # Load the WAV file. Modify this to use mp3 instead audio_path = 'myfile.wav' samples, sample_rate = librosa.load(audio_path, sr=16000, mono=True) # Run the pipeline prediction = pipe(samples)["text"] print(prediction) ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 8e-05 - lr_scheduler_type: linear - per_device_train_batch_size: 48 - total_train_batch_size_per_node: 192 - total_train_batch_size: 1536 - total_optimization_steps: 50000 - starting_optimization_step: None - finishing_optimization_step: 50000 - num_train_dataset_workers: 64 - total_num_training_examples: 76800000 ### Training results | step | eval_loss | train_loss | eval_wer | eval_cer | |:-----:|:---------:|:----------:|:--------:|:--------:| | 0 | 2.1853 | 2.6128 | 225.2741 | 151.0305 | | 2500 | 0.8090 | 0.6776 | 26.0049 | 10.4006 | | 5000 | 0.5674 | 0.5277 | 20.7674 | 8.7327 | | 7500 | 0.5255 | 0.4551 | 19.3971 | 8.5059 | | 10000 | 0.5774 | 0.4327 | 18.0877 | 8.0272 | ### Framework versions - Transformers 4.28.0.dev0 - Datasets 2.11.0 - Tokenizers 0.13.2