patrickvonplaten commited on
Commit
bd22d90
1 Parent(s): eca2923

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ datasets:
4
+ - librispeech_asr
5
+ tags:
6
+ - speech
7
+ - audio
8
+ - automatic-speech-recognition
9
+ - hf-asr-leaderboard
10
+ license: apache-2.0
11
+ ---
12
+
13
+ # Wav2Vec2-Conformer-Large-960h with Rotary Position Embeddings
14
+
15
+ [Facebook's Wav2Vec2 Conformer (TODO-add link)]()
16
+
17
+ Wav2Vec2 Conformer with rotary position embeddings, pretrained and fine-tuned on 100 hours of Librispeech on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
18
+
19
+ [Paper (TODO)](https://arxiv.org/abs/2006.11477)
20
+
21
+ Authors: ...
22
+
23
+ **Abstract**
24
+
25
+ ...
26
+
27
+ The original model can be found under https://github.com/pytorch/fairseq/tree/master/examples/wav2vec#wav2vec-20.
28
+
29
+
30
+ # Usage
31
+
32
+ To transcribe audio files the model can be used as a standalone acoustic model as follows:
33
+
34
+ ```python
35
+ from transformers import Wav2Vec2Processor, Wav2Vec2ConformerForCTC
36
+ from datasets import load_dataset
37
+ import torch
38
+
39
+ # load model and processor
40
+ processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-conformer-rope-large-100h-ft")
41
+ model = Wav2Vec2ConformerForCTC.from_pretrained("facebook/wav2vec2-conformer-rope-large-100h-ft")
42
+
43
+ # load dummy dataset and read soundfiles
44
+ ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
45
+
46
+ # tokenize
47
+ input_values = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest").input_values
48
+
49
+ # retrieve logits
50
+ logits = model(input_values).logits
51
+
52
+ # take argmax and decode
53
+ predicted_ids = torch.argmax(logits, dim=-1)
54
+ transcription = processor.batch_decode(predicted_ids)
55
+ ```