NbAiLabArchive
/

small_scream_april_beta

Automatic Speech Recognition

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

small_scream_april_beta / README.md

pere's picture

Update README.md

3a42c8c over 1 year ago

|

history blame contribute delete

No virus

3.57 kB

	---
	extra_gated_prompt: "This is a BETA-model. To use this model, you agree on the [licensing terms](license.md)."
	language:
	- 'no'
	license: apache-2.0
	tags:
	- audio
	- asr
	- automatic-speech-recognition
	- hf-asr-leaderboard
	model-index:
	- name: Small Scream - April Beta
	results: []
	---

	# Small Scream - April Beta
	This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the NbAiLab/NCC_speech_all_v5 dataset. It uses a Beam size of 5.
	It achieves the following results on the evaluation set:
	- step: 49999
	- eval_loss: 0.5299
	- train_loss: 0.3369
	- eval_wer: 11.9976
	- eval_cer: 5.6236

	## Model description

	This is a BETA version. You need to accept [the terms and conditons](license.md) to use it.


	## Using the Model
	There are several ways of using this model, and we do hope people will convert it into different formats. The code below allows you to process long files with Transformers.:

	```python
	import torch
	import numpy as np
	import librosa
	from transformers import pipeline

	# Try using "mps" for Metal (Mac), "cuda" if you have GPU, and "cpu" if not
	device = torch.device("cuda")

	pipe = pipeline("automatic-speech-recognition",
	model="NbAiLab/small_scream_april_beta",
	chunk_length_s=30,
	device=device,
	max_new_tokens=128,
	generate_kwargs={"language": "", "task": "transcribe"})

	# Load the WAV file. Modify this to use mp3 instead
	audio_path = 'myfile.wav'
	samples, sample_rate = librosa.load(audio_path, sr=16000, mono=True)

	# Run the pipeline
	prediction = pipe(samples)["text"]

	print(prediction)

	```

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 6e-06
	- lr_scheduler_type: linear
	- per_device_train_batch_size: 16
	- total_train_batch_size_per_node: 64
	- total_train_batch_size: 64
	- total_optimization_steps: 50000
	- starting_optimization_step: None
	- finishing_optimization_step: 50000
	- num_train_dataset_workers: 32
	- total_num_training_examples: 3200000

	### Training results

	\| step \| eval_loss \| train_loss \| eval_wer \| eval_cer \|
	\|:-----:\|:---------:\|:----------:\|:--------:\|:--------:\|
	\| 0 \| 1.5034 \| 1.6162 \| 32.7040 \| 11.9022 \|
	\| 0 \| 1.5034 \| 1.6129 \| 32.7040 \| 11.9022 \|
	\| 0 \| 1.5034 \| 1.5855 \| 32.7040 \| 11.9022 \|
	\| 2500 \| 0.9684 \| 0.6679 \| 21.7113 \| 8.0826 \|
	\| 5000 \| 0.8986 \| 0.6577 \| 18.6358 \| 7.3167 \|
	\| 7500 \| 0.7365 \| 0.4619 \| 16.3825 \| 6.8027 \|
	\| 10000 \| 0.6429 \| 0.4965 \| 14.7990 \| 6.2887 \|
	\| 12500 \| 0.6688 \| 0.4602 \| 13.7942 \| 6.0217 \|
	\| 15000 \| 0.6509 \| 0.4650 \| 13.3069 \| 5.9965 \|
	\| 17500 \| 0.5692 \| 0.3979 \| 12.8502 \| 5.6790 \|
	\| 20000 \| 0.5530 \| 0.3931 \| 13.0938 \| 5.8554 \|
	\| 22500 \| 0.5320 \| 0.4441 \| 12.5457 \| 5.7596 \|
	\| 25000 \| 0.5109 \| 0.4116 \| 12.7893 \| 5.8503 \|
	\| 27500 \| 0.4855 \| 0.3728 \| 12.9111 \| 5.8856 \|
	\| 30000 \| 0.4720 \| 0.3842 \| 12.6066 \| 5.8201 \|
	\| 32500 \| 0.4889 \| 0.3051 \| 12.4239 \| 5.7244 \|
	\| 35000 \| 0.5312 \| 0.3388 \| 12.6066 \| 5.9259 \|
	\| 37500 \| 0.5138 \| 0.3409 \| 12.3934 \| 5.7999 \|
	\| 40000 \| 0.5214 \| 0.2886 \| 11.9367 \| 5.5530 \|
	\| 42500 \| 0.5420 \| 0.3431 \| 12.6675 \| 5.9914 \|
	\| 45000 \| 0.5263 \| 0.4015 \| 12.3934 \| 5.9360 \|
	\| 47500 \| 0.5378 \| 0.3218 \| 12.1194 \| 5.6185 \|
	\| 49999 \| 0.5299 \| 0.3369 \| 11.9976 \| 5.6236 \|


	### Framework versions

	- Transformers 4.28.0.dev0
	- Datasets 2.11.0
	- Tokenizers 0.13.3