naver
/

mHuBERT-147-ASR-fr

Automatic Speech Recognition

Model card Files Files and versions Community

mHuBERT-147-ASR-fr / README.md

mzboito's picture

Update README.md

f5f3de8 verified 3 months ago

|

history blame contribute delete

2.92 kB

	---
	license: cc-by-nc-sa-4.0
	base_model: utter-project/mHuBERT-147
	datasets:
	- FBK-MT/Speech-MASSIVE
	- FBK-MT/Speech-MASSIVE-test
	- mozilla-foundation/common_voice_17_0
	- google/fleurs
	language:
	- fr
	metrics:
	- wer
	- cer
	pipeline_tag: automatic-speech-recognition
	---

	This is a small CTC-based Automatic Speech Recognition system for French.

	This model is part of our SLU demo available here: https://huggingface.co/spaces/naver/French-SLU-DEMO-Interspeech2024

	Please check our blog post available at: TBD

	* Training data: 123 hours (84,707 utterances)
	* Normalization: Whisper normalization

	# Table of Contents:
	1. [Performance](https://huggingface.co/naver/mHuBERT-147-ASR-fr#performance)
	2. [Training Parameters](https://huggingface.co/naver/mHuBERT-147-ASR-fr#training-parameters)
	3. [ASR Model class](https://huggingface.co/naver/mHuBERT-147-ASR-fr#asr-model-class)
	4. [Running inference](https://huggingface.co/naver/mHuBERT-147-ASR-fr#running-inference)

	## Performance

	\| \| dev WER \| dev CER \| test WER \| test CER \|
	\|:------------------:\|:-----------:\|:-----------:\|:------------:\|:------------:\|
	\| speechMASSIVE \| 9.2 \| 2.6 \| 9.6 \| 2.9 \|
	\| fleurs102 \| 20.0 \| 7.0 \| 22.0 \| 7.7 \|
	\| CommonVoice 17 \| 16.0 \| 4.9 \| 19.0 \| 6.5 \|

	## Training Parameters

	This is a [mHuBERT-147](https://huggingface.co/utter-project/mHuBERT-147) ASR fine-tuned model.
	The training parameters are available in [config.json](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/config.json).
	We highlight the use of 0.3 for hubert.final_dropout, which we found to be very helpful in convergence. We also use fp32 training, as we found fp16 training to be unstable.

	## ASR Model Class

	We use the mHubertForCTC class for our model, which is nearly identical to the existing HubertForCTC class.
	The key difference is that we've added a few additional hidden layers at the end of the Transformer stack, just before the lm_head.
	The code is available in [CTC_model.py](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/inference_code/CTC_model.py).

	## Running Inference

	The [run_inference.py](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/inference_code/run_inference.py) file illustrates how to load the model for inference (load_asr_model), and how to produce transcription for a file (run_asr_inference).
	Please follow the [requirements file](https://huggingface.co/naver/mHuBERT-147-ASR-fr/blob/main/requirements.txt) to avoid incorrect model loading.

	Here is a simple example of the inference loop. Please notice that the sampling rate must be 16,000Hz.

	```
	from inference_code.run_inference import load_asr_model, run_asr_inference

	model, processor = load_asr_model()

	prediction = run_inference(model, processor, your_audio_file)

	```