facebook
/

wav2vec2-conformer-rope-large-100h-ft

Automatic Speech Recognition

wav2vec2-conformer

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Community

patrickvonplaten commited on May 1, 2022

Commit

bd22d90

•

1 Parent(s): eca2923

Create README.md

Files changed (1) hide show

README.md +55 -0

README.md ADDED Viewed

	@@ -0,0 +1,55 @@

+---
+language: en
+datasets:
+- librispeech_asr
+tags:
+- speech
+- audio
+- automatic-speech-recognition
+- hf-asr-leaderboard
+license: apache-2.0
+---
+# Wav2Vec2-Conformer-Large-960h with Rotary Position Embeddings
+[Facebook's Wav2Vec2 Conformer (TODO-add link)]()
+Wav2Vec2 Conformer with rotary position embeddings, pretrained and fine-tuned on 100 hours of Librispeech on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
+[Paper (TODO)](https://arxiv.org/abs/2006.11477)
+Authors: ...
+**Abstract**
+...
+The original model can be found under https://github.com/pytorch/fairseq/tree/master/examples/wav2vec#wav2vec-20.
+# Usage
+To transcribe audio files the model can be used as a standalone acoustic model as follows:
+```python
+ from transformers import Wav2Vec2Processor, Wav2Vec2ConformerForCTC
+ from datasets import load_dataset
+ import torch
+ # load model and processor
+ processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-conformer-rope-large-100h-ft")
+ model = Wav2Vec2ConformerForCTC.from_pretrained("facebook/wav2vec2-conformer-rope-large-100h-ft")
+ # load dummy dataset and read soundfiles
+ ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
+ # tokenize
+ input_values = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest").input_values
+ # retrieve logits
+ logits = model(input_values).logits
+ # take argmax and decode
+ predicted_ids = torch.argmax(logits, dim=-1)
+ transcription = processor.batch_decode(predicted_ids)
+ ```