metadata
license: cc-by-nc-nd-4.0
datasets:
- openslr
language:
- gl
pipeline_tag: automatic-speech-recognition
tags:
- ITG
- PyTorch
- Transformers
- whisper
- whisper-base
whisper-base-gl
Description
This is a fine-tuned version of the openai/whisper-base pre-trained model for ASR in galician.
Dataset
We used one of the datasets available in the openslr repository, the OpenSLR galician.
Example inference script
Check this example script to run our model in inference mode
import torch
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
filename = "demo.wav" #change this line to the name of your audio file
sample_rate = 16_000
processor = AutoProcessor.from_pretrained('ITG/whisper-base-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-base-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
with torch.no_grad():
speech_array, _ = librosa.load(filename, sr=sample_rate)
inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
input_features = inputs.input_features
generated_ids = model.generate(inputs=input_features, max_length=225)
decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"ASR Galician whisper-base output: {decode_output}")
Fine-tuning hyper-parameters
Hyper-parameter | Value |
---|---|
Training batch size | 16 |
Evaluation batch size | 8 |
Learning rate | 3e-5 |
Gradient checkpointing | true |
Gradient accumulation steps | 1 |
Max training epochs | 100 |
Max steps | 4000 |
Generate max length | 225 |
Warmup training steps (%) | 12,5% |
FP16 | true |
Metric for best model | wer |
Greater is better | false |
Fine-tuning in a different dataset or style
If you're interested in fine-tuning your own whisper model, we suggest starting with the openai/whisper-base model. Additionally, you may find the Transformers step-by-step guide for fine-tuning whisper on multilingual ASR datasets to be a valuable resource. This guide served as a helpful reference during the training process of this Galician whisper-base model!