naver
/

mHuBERT-147-ASR-fr

Automatic Speech Recognition

Model card Files Files and versions Community

mHuBERT-147-ASR-fr / README.md

mzboito's picture

Update README.md

9959545 verified 3 months ago

|

1.53 kB

metadata

license: cc-by-nc-sa-4.0
base_model: utter-project/mHuBERT-147
datasets:
  - FBK-MT/Speech-MASSIVE
  - FBK-MT/Speech-MASSIVE-test
  - mozilla-foundation/common_voice_17_0
  - google/fleurs
language:
  - fr
metrics:
  - wer
  - cer
pipeline_tag: automatic-speech-recognition

This is a CTC-based Automatic Speech Recognition system for French. This model is part of the SLU demo available here: [LINK TO THE DEMO GOES HERE] It is based on the mHuBERT-147 speech foundation model.

Training data: XX hours
Normalization: Whisper normalization
Performance:

Table of Contents:

Training Parameters
ASR Model class
Running inference

Training Parameters

The training parameters are available in config.yaml. We downsample the commonvoice dataset to 70,000 utterances.

ASR Model class

We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class. The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head. The code is available in CTC_model.py.

Running inference

The run_asr.py file illustrates how to load the model for inference (load_asr_model), and how to produce transcription for a file (run_asr_inference). Please follow the requirements.txt to avoid incorrect model loading.