bofenghuang
/

phonemizer-wav2vec2-ctc-french

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

bofenghuang commited on Jun 28

Commit

1cdb121

•

1 Parent(s): a3037d1

up

Files changed (1) hide show

README.md +84 -0

README.md CHANGED Viewed

@@ -1,3 +1,87 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language: fr
+datasets:
+- mozilla-foundation/common_voice_13_0
+tags:
+- automatic-speech-recognition
 ---
+# Wav2vec2-CTC-based French Phonemizer
+## Usage
+*Infer audio*
+```python
+import soundfile as sf
+import torch
+from transformers import AutoModelForCTC, AutoProcessor, pipeline
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
+# Load model
+model_name_or_path = "bofenghuang/phonemizer-wav2vec2-ctc-french"
+processor = AutoProcessor.from_pretrained(model_name_or_path)
+model_sample_rate = processor.feature_extractor.sampling_rate
+model = AutoModelForCTC.from_pretrained(model_name_or_path, torch_dtype=torch_dtype)
+model.to(device)
+# Init pipeline
+pipe = pipeline(
+    "automatic-speech-recognition",
+    model=model,
+    feature_extractor=processor.feature_extractor,
+    tokenizer=processor.tokenizer,
+    torch_dtype=torch_dtype,
+    device=device,
+)
+# Example audio
+audio_file_path = "/path/to/example/wav/file"
+# Infer with pipeline
+result = pipe(audio_file_path)
+print(result["text"])
+# Infer w/ lower-level api
+waveform, sample_rate = sf.read(audio_file_path, start=0, frames=-1, dtype="float32", always_2d=False)
+input_dict = processor(waveform, sampling_rate=model_sample_rate, return_tensors="pt")
+with torch.inference_mode():
+    input_values = input_dict.input_values.to(device, dtype=torch_dtype)
+    logits = model(input_values).logits
+predicted_ids = torch.argmax(logits, dim=-1)
+predicted_text = processor.batch_decode(predicted_ids)[0]
+print(predicted_text)
+```
+*Phonemes were generated using the following code snippet:*
+```python
+# !pip install phonemizer
+from phonemizer.backend import EspeakBackend
+from phonemizer.separator import Separator
+# initialize the espeak backend for French
+backend = EspeakBackend("fr-fr", language_switch="remove-flags")
+# separate phones by a space and ignoring words boundaries
+separator = Separator(phone=None, word=" ", syllable="")
+def phonemize_text_phonemizer(s):
+    return backend.phonemize([s], separator=separator, strip=True, njobs=1)[0]
+input_str = "ce modèle est utilisé pour identifier les phonèmes dans l'audio entrant"
+print(phonemize_text_phonemizer(input_str))
+# 'sə modɛl ɛt ytilize puʁ idɑ̃tifje le fonɛm dɑ̃ lodjo ɑ̃tʁɑ̃'
+```
+## Acknowledgement
+Inspired by [Cnam-LMSSC/wav2vec2-french-phonemizer](https://huggingface.co/Cnam-LMSSC/wav2vec2-french-phonemizer)