patrickvonplaten
commited on
Commit
•
bd966d7
1
Parent(s):
ee2ffc4
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
datasets:
|
4 |
+
- librispeech_asr
|
5 |
+
tags:
|
6 |
+
- speech
|
7 |
+
license: apache-2.0
|
8 |
+
---
|
9 |
+
|
10 |
+
# Wav2Vec2-Conformer-Large with Relative Position Embeddings
|
11 |
+
|
12 |
+
[Facebook's Wav2Vec2 Conformer (TODO-add link)]()
|
13 |
+
|
14 |
+
Wav2Vec2 Conformer with relative position embeddings, pretrained on 960 hours of Librispeech on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
|
15 |
+
|
16 |
+
**Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model.
|
17 |
+
|
18 |
+
[Paper (TODO)](https://arxiv.org/abs/2006.11477)
|
19 |
+
|
20 |
+
Authors: ...
|
21 |
+
|
22 |
+
**Abstract**
|
23 |
+
|
24 |
+
...
|
25 |
+
|
26 |
+
The original model can be found under https://github.com/pytorch/fairseq/tree/master/examples/wav2vec#wav2vec-20.
|
27 |
+
|
28 |
+
# Usage
|
29 |
+
|
30 |
+
See [this notebook](https://colab.research.google.com/drive/1FjTsqbYKphl9kL-eILgUc-bl4zVThL8F?usp=sharing) for more information on how to fine-tune the model.
|