patrickvonplaten
commited on
Commit
•
8844157
1
Parent(s):
c12bf04
Update README.md
Browse files
README.md
CHANGED
@@ -2,7 +2,6 @@
|
|
2 |
language:
|
3 |
- en
|
4 |
datasets:
|
5 |
-
- common_voice
|
6 |
tags:
|
7 |
- speech
|
8 |
license: apache-2.0
|
@@ -14,6 +13,12 @@ license: apache-2.0
|
|
14 |
|
15 |
The large model pretrained on 16kHz sampled speech audio with utterance and speaker contrastive loss. When using the model, make sure that your speech input is also sampled at 16kHz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more an in-detail explanation of how to fine-tune the model.
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
[Paper: UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER
|
18 |
AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752)
|
19 |
|
|
|
2 |
language:
|
3 |
- en
|
4 |
datasets:
|
|
|
5 |
tags:
|
6 |
- speech
|
7 |
license: apache-2.0
|
|
|
13 |
|
14 |
The large model pretrained on 16kHz sampled speech audio with utterance and speaker contrastive loss. When using the model, make sure that your speech input is also sampled at 16kHz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more an in-detail explanation of how to fine-tune the model.
|
15 |
|
16 |
+
The model was pre-trained on:
|
17 |
+
|
18 |
+
- 60,000 hours of [Libri-Light](https://arxiv.org/abs/1912.07875)
|
19 |
+
- 10,000 hours of [GigaSpeech](https://arxiv.org/abs/2106.06909)
|
20 |
+
- 24,000 hours of [VoxPopuli](https://arxiv.org/abs/2101.00390)
|
21 |
+
|
22 |
[Paper: UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER
|
23 |
AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752)
|
24 |
|