File size: 1,527 Bytes
e7797d1
 
 
 
 
 
 
 
 
 
 
 
 
 
9959545
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
license: cc-by-nc-sa-4.0
base_model: utter-project/mHuBERT-147
datasets:
- FBK-MT/Speech-MASSIVE
- FBK-MT/Speech-MASSIVE-test
- mozilla-foundation/common_voice_17_0
- google/fleurs
language:
- fr
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
---

**This is a CTC-based Automatic Speech Recognition system for French.**
This model is part of the SLU demo available here: [LINK TO THE DEMO GOES HERE]
It is based on the [mHuBERT-147](https://huggingface.co/utter-project/mHuBERT-147) speech foundation model.

* Training data: XX hours
* Normalization: Whisper normalization
* Performance:


# Table of Contents:
1. Training Parameters
2. [ASR Model class](https://huggingface.co/naver/mHuBERT-147-ASR-fr#ASR-Model-class)
3. Running inference

## Training Parameters
The training parameters are available in config.yaml.
We downsample the commonvoice dataset to 70,000 utterances.

## ASR Model class

We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class. 
The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head. 
The code is available in [CTC_model.py](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/CTC_model.py).

## Running inference

The run_asr.py file illustrates how to load the model for inference (**load_asr_model**), and how to produce transcription for a file (**run_asr_inference**).
Please follow the requirements.txt to avoid incorrect model loading.