File size: 1,527 Bytes
e7797d1 9959545 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
---
license: cc-by-nc-sa-4.0
base_model: utter-project/mHuBERT-147
datasets:
- FBK-MT/Speech-MASSIVE
- FBK-MT/Speech-MASSIVE-test
- mozilla-foundation/common_voice_17_0
- google/fleurs
language:
- fr
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
---
**This is a CTC-based Automatic Speech Recognition system for French.**
This model is part of the SLU demo available here: [LINK TO THE DEMO GOES HERE]
It is based on the [mHuBERT-147](https://huggingface.co/utter-project/mHuBERT-147) speech foundation model.
* Training data: XX hours
* Normalization: Whisper normalization
* Performance:
# Table of Contents:
1. Training Parameters
2. [ASR Model class](https://huggingface.co/naver/mHuBERT-147-ASR-fr#ASR-Model-class)
3. Running inference
## Training Parameters
The training parameters are available in config.yaml.
We downsample the commonvoice dataset to 70,000 utterances.
## ASR Model class
We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class.
The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head.
The code is available in [CTC_model.py](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/CTC_model.py).
## Running inference
The run_asr.py file illustrates how to load the model for inference (**load_asr_model**), and how to produce transcription for a file (**run_asr_inference**).
Please follow the requirements.txt to avoid incorrect model loading. |