facebook
/

data2vec-audio-base-960h

Automatic Speech Recognition

hf-asr-leaderboard

Inference Endpoints

Model card Files Files and versions Community

patrickvonplaten commited on Apr 18, 2022

Commit

bc45eb4

•

1 Parent(s): abbaedb

Update README.md

Files changed (1) hide show

README.md +39 -1

README.md CHANGED Viewed

@@ -56,4 +56,42 @@ To transcribe audio files the model can be used as a standalone acoustic model a
  # take argmax and decode
  predicted_ids = torch.argmax(logits, dim=-1)
  transcription = processor.batch_decode(predicted_ids)
- ```

  # take argmax and decode
  predicted_ids = torch.argmax(logits, dim=-1)
  transcription = processor.batch_decode(predicted_ids)
+ ```
+  ## Evaluation
+ This code snippet shows how to evaluate **facebook/data2vec-audio-base-960h** on LibriSpeech's "clean" and "other" test data.
+```python
+ from transformers import Wav2Vec2Processor, Data2VecForCTC
+ from datasets import load_dataset
+ import torch
+ from jiwer import wer
+ # load model and processor
+ processor = Wav2Vec2Processor.from_pretrained("facebook/data2vec-audio-base-960h").to("cuda")
+ model = Data2VecForCTC.from_pretrained("facebook/data2vec-audio-base-960h")
+librispeech_eval = load_dataset("librispeech_asr", "clean", split="test")
+def map_to_pred(batch):
+    input_values = processor(batch["audio"]["array"], return_tensors="pt", padding="longest").input_values
+    with torch.no_grad():
+        logits = model(input_values.to("cuda")).logits
+    predicted_ids = torch.argmax(logits, dim=-1)
+    transcription = processor.batch_decode(predicted_ids)
+    batch["transcription"] = transcription
+    return batch
+result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1, remove_columns=["audio"])
+print("WER:", wer(result["text"], result["transcription"]))
+```
+*Result (WER)*:
+| "clean" | "other" |
+|---|---|
+| 3.4 | 8.6 |