--- license: cc-by-nc-4.0 language: ddn metrics: - wer tags: - text-to-audio - automatic-speech-recognition - wav2vec2-fine-tuning - dendi-text-to-speech model-index: - name: Dendi Numerals ASR results: - task: name: Speech Recognition type: automatic-speech-recognition dataset: name: dendi type: dendi_numbers_dataset metrics: - name: Test WER type: wer value: 18.18 pipeline_tag: automatic-speech-recognition --- # CreaTiv Team (CTT): Dendi Numerals Automatic Speech Recognition This repository contains an Automatic Speech Recognition (ASR) model specifically for recognizing numerals in the Dendi (ddn) language. The model can accurately recognize numbers ranging from 0 to 1,000,000,000 when spoken in Dendi. This model is part of Creativ Team's [Noulinmon](https://noulinmon.baruwuu.bj/) project, a user-friendly mobile app designed to make calculations accessible in six local languages of Benin, featuring voice reading and AI capabilities. You can find more CTT-ASR models on the Hugging Face Hub: [ssid32/ctt-asr](https://huggingface.co/models?sort=trending&search=ssid32). CTT-ASR is available in the 🤗 Transformers library from version 4.4 onwards. ## Model Details The model is a fine-tuned version of [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Dendi. When using this model, make sure that your speech input is sampled at 16kHz. ## Usage To use this model, first install the latest version of 🤗 Transformers library: ``` pip install --upgrade transformers accelerate ``` Then, run inference with the following code-snippet: ```python import torch import torchaudio from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor processor = Wav2Vec2Processor.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals") model = Wav2Vec2ForCTC.from_pretrained("ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals") speech_array, sampling_rate = torchaudio.load("audio_test.wav") speech_array = speech_array.squeeze().numpy() inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True) with torch.no_grad(): logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits output = processor.batch_decode(torch.argmax(logits, dim=-1)) print("Output:", output) ``` You can listen to the sample audio here: Upon processing the sample audio, the model produces the following output: ``` Output: ['zangu ihaaku nda weiguu'] ``` In this case, the output represents the numeral **850** in the Dendi language. ### Evaluation result The model's performance on a test set yields a Word Error Rate (WER) of **18.18**%. ## Authors This model was developed by: - Salim KORA GUERA (HuggingFace Username: [ssid32](https://huggingface.co/ssid32)) | (koravant1@gmail.com) - Etienne TOVIMAFA (HuggingFace Username: [MrBendji](https://huggingface.co/MrBendji)) | (abiodouneti@gmail.com) ## Citation ```bibtex @misc { author = { {Salim KORA GUERA and Etienne TOVIMAFA} }, title = { wav2vec2-xlsr-dendi-ddn-for-numerals }, year = 2024, url = { https://huggingface.co/ssid32/wav2vec2-xlsr-dendi-ddn-for-numerals }, doi = { 10.57967/hf/2930 }, publisher = { Hugging Face } } ``` ## License The model is licensed as **CC-BY-NC 4.0**.