ASR
Collection
Automatic Speech Recognition ITG model collection
•
3 items
•
Updated
•
1
This is a fine-tuned version of the openai/whisper-small pre-trained model for ASR in galician.
We used two datasets combined:
import torch
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
filename = "demo.wav" #change this line to the name of your audio file
sample_rate = 16_000
processor = AutoProcessor.from_pretrained('ITG/whisper-small-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-small-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
with torch.no_grad():
speech_array, _ = librosa.load(filename, sr=sample_rate)
inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
input_features = inputs.input_features
generated_ids = model.generate(inputs=input_features, max_length=225)
decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"ASR Galician whisper-small output: {decode_output}")
Hyper-parameter | Value |
---|---|
Training batch size | 16 |
Evaluation batch size | 8 |
Learning rate | 1e-5 |
Gradient checkpointing | true |
Gradient accumulation steps | 1 |
Max training epochs | 100 |
Max steps | 4000 |
Generate max length | 225 |
Warmup training steps (%) | 12,5% |
FP16 | true |
Metric for best model | wer |
Greater is better | false |
If you're interested in fine-tuning your own whisper model, we suggest starting with the openai/whisper-small model. Additionally, you may find the Transformers step-by-step guide for fine-tuning whisper on multilingual ASR datasets to be a valuable resource. This guide served as a helpful reference during the training process of this Galician whisper-small model!