naver
/

mHuBERT-147-ASR-fr

Automatic Speech Recognition

Model card Files Files and versions Community

mHuBERT-147-ASR-fr / README.md

mzboito's picture

Update README.md

9959545 verified 3 months ago

|

1.53 kB

	---
	license: cc-by-nc-sa-4.0
	base_model: utter-project/mHuBERT-147
	datasets:
	- FBK-MT/Speech-MASSIVE
	- FBK-MT/Speech-MASSIVE-test
	- mozilla-foundation/common_voice_17_0
	- google/fleurs
	language:
	- fr
	metrics:
	- wer
	- cer
	pipeline_tag: automatic-speech-recognition
	---

	This is a CTC-based Automatic Speech Recognition system for French.
	This model is part of the SLU demo available here: [LINK TO THE DEMO GOES HERE]
	It is based on the [mHuBERT-147](https://huggingface.co/utter-project/mHuBERT-147) speech foundation model.

	* Training data: XX hours
	* Normalization: Whisper normalization
	* Performance:


	# Table of Contents:
	1. Training Parameters
	2. [ASR Model class](https://huggingface.co/naver/mHuBERT-147-ASR-fr#ASR-Model-class)
	3. Running inference

	## Training Parameters
	The training parameters are available in config.yaml.
	We downsample the commonvoice dataset to 70,000 utterances.

	## ASR Model class

	We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class.
	The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head.
	The code is available in [CTC_model.py](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/CTC_model.py).

	## Running inference

	The run_asr.py file illustrates how to load the model for inference (load_asr_model), and how to produce transcription for a file (run_asr_inference).
	Please follow the requirements.txt to avoid incorrect model loading.