--- license: cc-by-nc-sa-4.0 base_model: utter-project/mHuBERT-147 datasets: - FBK-MT/Speech-MASSIVE - FBK-MT/Speech-MASSIVE-test - mozilla-foundation/common_voice_17_0 - google/fleurs language: - fr metrics: - wer - cer pipeline_tag: automatic-speech-recognition --- **This is a CTC-based Automatic Speech Recognition system for French.** This model is part of the SLU demo available here: [LINK TO THE DEMO GOES HERE] It is based on the [mHuBERT-147](https://huggingface.co/utter-project/mHuBERT-147) speech foundation model. * Training data: XX hours * Normalization: Whisper normalization * Performance: # Table of Contents: 1. Training Parameters 2. [ASR Model class](https://huggingface.co/naver/mHuBERT-147-ASR-fr#ASR-Model-class) 3. Running inference ## Training Parameters The training parameters are available in config.yaml. We downsample the commonvoice dataset to 70,000 utterances. ## ASR Model class We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class. The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head. The code is available in [CTC_model.py](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/CTC_model.py). ## Running inference The run_asr.py file illustrates how to load the model for inference (**load_asr_model**), and how to produce transcription for a file (**run_asr_inference**). Please follow the requirements.txt to avoid incorrect model loading.