license: cc-by-nc-sa-4.0
base_model: utter-project/mHuBERT-147
datasets:
- FBK-MT/Speech-MASSIVE
- FBK-MT/Speech-MASSIVE-test
- mozilla-foundation/common_voice_17_0
- google/fleurs
language:
- fr
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
This is a CTC-based Automatic Speech Recognition system for French. This model is part of the SLU demo available here: [LINK TO THE DEMO GOES HERE] It is based on the mHuBERT-147 speech foundation model.
- Training data: XX hours
- Normalization: Whisper normalization
- Performance:
Table of Contents:
- Training Parameters
- ASR Model class
- Running inference
Training Parameters
The training parameters are available in config.yaml. We downsample the commonvoice dataset to 70,000 utterances.
ASR Model class
We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class. The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head. The code is available in CTC_model.py.
Running inference
The run_asr.py file illustrates how to load the model for inference (load_asr_model), and how to produce transcription for a file (run_asr_inference). Please follow the requirements file to avoid incorrect model loading.