|
--- |
|
license: cc-by-nc-sa-4.0 |
|
base_model: utter-project/mHuBERT-147 |
|
datasets: |
|
- FBK-MT/Speech-MASSIVE |
|
- FBK-MT/Speech-MASSIVE-test |
|
- mozilla-foundation/common_voice_17_0 |
|
- google/fleurs |
|
language: |
|
- fr |
|
metrics: |
|
- wer |
|
- cer |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
**This is a CTC-based Automatic Speech Recognition system for French.** |
|
This model is part of the SLU demo available here: [LINK TO THE DEMO GOES HERE] |
|
It is based on the [mHuBERT-147](https://huggingface.co/utter-project/mHuBERT-147) speech foundation model. |
|
|
|
* Training data: XX hours |
|
* Normalization: Whisper normalization |
|
* Performance: |
|
|
|
|
|
# Table of Contents: |
|
1. Training Parameters |
|
2. [ASR Model class](https://huggingface.co/naver/mHuBERT-147-ASR-fr#ASR-Model-class) |
|
3. Running inference |
|
|
|
## Training Parameters |
|
The training parameters are available in config.yaml. |
|
We downsample the commonvoice dataset to 70,000 utterances. |
|
|
|
## ASR Model class |
|
|
|
We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class. |
|
The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head. |
|
The code is available in [CTC_model.py](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/CTC_model.py). |
|
|
|
## Running inference |
|
|
|
The run_asr.py file illustrates how to load the model for inference (**load_asr_model**), and how to produce transcription for a file (**run_asr_inference**). |
|
Please follow the requirements.txt to avoid incorrect model loading. |