File size: 3,078 Bytes
3457d60
 
55153c5
 
 
 
 
 
 
 
 
 
 
3457d60
55153c5
edc2869
55153c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
license: cc-by-nc-nd-4.0
datasets:
- openslr
language:
- gl
pipeline_tag: automatic-speech-recognition
tags:
- ITG
- PyTorch
- Transformers
- whisper
- whisper-base
---

# Whisper Base Galician

## Description
  
This is a fine-tuned version of the [openai/whisper-base](https://huggingface.co/openai/whisper-base) pre-trained model for ASR in galician. 

---

## Dataset 

We used one of the datasets available in the openslr repository, the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77). 

--- 


## Example inference script

### Check this example script to run our model in inference mode

```python
import torch
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq

filename = "demo.wav"  #change this line to the name of your audio file
sample_rate = 16_000   
processor = AutoProcessor.from_pretrained('ITG/whisper-base-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-base-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

with torch.no_grad():
  speech_array, _ = librosa.load(filename, sr=sample_rate)
  inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
  input_features = inputs.input_features
  generated_ids = model.generate(inputs=input_features, max_length=225)
  decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"ASR Galician whisper-base output: {decode_output}")
```
---

## Fine-tuning hyper-parameters

|            **Hyper-parameter**           |          **Value**          |
|:----------------------------------------:|:---------------------------:|
|            Training batch size           |             16              |
|           Evaluation batch size          |             8               |
|               Learning rate              |             3e-5            |
|           Gradient checkpointing         |             true            |
|         Gradient accumulation steps      |             1               | 
|            Max training epochs           |             100             |
|                Max steps                 |             4000            |
|            Generate max length           |             225             |
|         Warmup training steps (%)        |             12,5%           |
|                  FP16                    |             true            |
|          Metric for best model           |             wer             |
|            Greater is better             |             false           |


## Fine-tuning in a different dataset or style

If you're interested in fine-tuning your own whisper model, we suggest starting with the [openai/whisper-base model](https://huggingface.co/openai/whisper-base). Additionally, you may find the Transformers 
step-by-step guide for [fine-tuning whisper on multilingual ASR datasets](https://huggingface.co/blog/fine-tune-whisper) to be a valuable resource. This guide served as a helpful reference during the training 
process of this Galician whisper-base model!