mHuBERT-147-ASR-fr / README.md
mzboito's picture
Update README.md
9959545 verified
|
raw
history blame
1.53 kB
metadata
license: cc-by-nc-sa-4.0
base_model: utter-project/mHuBERT-147
datasets:
  - FBK-MT/Speech-MASSIVE
  - FBK-MT/Speech-MASSIVE-test
  - mozilla-foundation/common_voice_17_0
  - google/fleurs
language:
  - fr
metrics:
  - wer
  - cer
pipeline_tag: automatic-speech-recognition

This is a CTC-based Automatic Speech Recognition system for French. This model is part of the SLU demo available here: [LINK TO THE DEMO GOES HERE] It is based on the mHuBERT-147 speech foundation model.

  • Training data: XX hours
  • Normalization: Whisper normalization
  • Performance:

Table of Contents:

  1. Training Parameters
  2. ASR Model class
  3. Running inference

Training Parameters

The training parameters are available in config.yaml. We downsample the commonvoice dataset to 70,000 utterances.

ASR Model class

We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class. The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head. The code is available in CTC_model.py.

Running inference

The run_asr.py file illustrates how to load the model for inference (load_asr_model), and how to produce transcription for a file (run_asr_inference). Please follow the requirements.txt to avoid incorrect model loading.