ITG
/

whisper-base-gl

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

rgomez-itg commited on Jul 5, 2023

Commit

55153c5

•

1 Parent(s): 4a83b85

Update README.md

Files changed (1) hide show

README.md +78 -0

README.md CHANGED Viewed

@@ -1,3 +1,81 @@
 ---
 license: cc-by-nc-nd-4.0
 ---

 ---
 license: cc-by-nc-nd-4.0
+datasets:
+- openslr
+language:
+- gl
+pipeline_tag: automatic-speech-recognition
+tags:
+- ITG
+- PyTorch
+- Transformers
+- whisper
+- whisper-base
 ---
+# whisper-base-gl
+## Description
+This is a fine-tuned version of the [openai/whisper-base](https://huggingface.co/openai/whisper-base) pre-trained model for ASR in galician.
+---
+## Dataset
+We used one of the datasets available in the openslr repository, the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77).
+---
+## Example inference script
+### Check this example script to run our model in inference mode
+```python
+import torch
+from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
+filename = "demo.wav"  #change this line to the name of your audio file
+sample_rate = 16_000
+processor = AutoProcessor.from_pretrained('ITG/whisper-base-gl')
+model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-base-gl')
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+model.to(device)
+with torch.no_grad():
+  speech_array, _ = librosa.load(filename, sr=sample_rate)
+  inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
+  input_features = inputs.input_features
+  generated_ids = model.generate(inputs=input_features, max_length=225)
+  decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(f"ASR Galician whisper-base output: {decode_output}")
+```
+---
+## Fine-tuning hyper-parameters
+|            **Hyper-parameter**           |          **Value**          |
+|:----------------------------------------:|:---------------------------:|
+|            Training batch size           |             16              |
+|           Evaluation batch size          |             8               |
+|               Learning rate              |             3e-5            |
+|           Gradient checkpointing         |             true            |
+|         Gradient accumulation steps      |             1               |
+|            Max training epochs           |             100             |
+|                Max steps                 |             4000            |
+|            Generate max length           |             225             |
+|         Warmup training steps (%)        |             12,5%           |
+|                  FP16                    |             true            |
+|          Metric for best model           |             wer             |
+|            Greater is better             |             false           |
+## Fine-tuning in a different dataset or style
+If you're interested in fine-tuning your own whisper model, we suggest starting with the [openai/whisper-base model](https://huggingface.co/openai/whisper-base). Additionally, you may find the Transformers
+step-by-step guide for [fine-tuning whisper on multilingual ASR datasets](https://huggingface.co/blog/fine-tune-whisper) to be a valuable resource. This guide served as a helpful reference during the training
+process of this Galician whisper-base model!