Update README.md
Browse files
README.md
CHANGED
@@ -8,11 +8,12 @@ license: "cc-by-nc-sa-4.0"
|
|
8 |
---
|
9 |
|
10 |
# wav2vec2-base-sk-17k
|
11 |
-
This is a monolingual Slovak Wav2Vec 2.0 base model pre-trained from 17 thousand
|
|
|
12 |
|
13 |
This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created, and the model should be fine-tuned on labeled data.
|
14 |
|
15 |
-
The model was initialized from Czech pre-trained model [fav-kky/wav2vec2-base-cs-80k-ClTRUS](https://huggingface.co/fav-kky/wav2vec2-base-cs-80k-ClTRUS). We found this cross-language transfer learning approach better than pre-training from scratch. See our paper for details.
|
16 |
|
17 |
## Pretraining data
|
18 |
Almost 18 thousand hours of unlabeled Slovak speech:
|
@@ -51,29 +52,35 @@ After fine-tuning, the model scored the following results on public datasets:
|
|
51 |
See our paper for details.
|
52 |
|
53 |
## Paper
|
54 |
-
The
|
|
|
|
|
55 |
|
56 |
## Citation
|
57 |
If you find this model useful, please cite our paper:
|
58 |
```
|
59 |
@inproceedings{wav2vec2-base-sk-17k,
|
60 |
-
title = {{Transfer Learning of Transformer-based Speech Recognition Models from Czech to Slovak}},
|
61 |
author = {
|
62 |
-
|
63 |
-
Josef V.
|
64 |
-
Josef
|
65 |
},
|
66 |
-
|
67 |
-
publisher = {{Springer International Publishing}},
|
68 |
year = {2023},
|
69 |
-
|
70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
}
|
72 |
```
|
73 |
|
74 |
## Related papers
|
75 |
- [INTERSPEECH 2022 - Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech](https://www.isca-speech.org/archive/pdfs/interspeech_2022/lehecka22_interspeech.pdf)
|
76 |
-
- INTERSPEECH 2023 - Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech
|
77 |
|
78 |
## Related models
|
79 |
- [fav-kky/wav2vec2-base-cs-80k-ClTRUS](https://huggingface.co/fav-kky/wav2vec2-base-cs-80k-ClTRUS)
|
|
|
8 |
---
|
9 |
|
10 |
# wav2vec2-base-sk-17k
|
11 |
+
This is a monolingual Slovak Wav2Vec 2.0 base model pre-trained from 17 thousand hours of Slovak speech.
|
12 |
+
It was introduced in the paper **Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak** accepted for the TSD2023 conference.
|
13 |
|
14 |
This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created, and the model should be fine-tuned on labeled data.
|
15 |
|
16 |
+
The model was initialized from the Czech pre-trained model [fav-kky/wav2vec2-base-cs-80k-ClTRUS](https://huggingface.co/fav-kky/wav2vec2-base-cs-80k-ClTRUS). We found this cross-language transfer learning approach better than pre-training from scratch. See our paper for details.
|
17 |
|
18 |
## Pretraining data
|
19 |
Almost 18 thousand hours of unlabeled Slovak speech:
|
|
|
52 |
See our paper for details.
|
53 |
|
54 |
## Paper
|
55 |
+
The paper is available at https://link.springer.com/chapter/10.1007/978-3-031-40498-6_29.
|
56 |
+
|
57 |
+
The pre-print of our paper is available at https://arxiv.org/abs/2306.04399.
|
58 |
|
59 |
## Citation
|
60 |
If you find this model useful, please cite our paper:
|
61 |
```
|
62 |
@inproceedings{wav2vec2-base-sk-17k,
|
|
|
63 |
author = {
|
64 |
+
Lehe\v{c}ka, Jan and
|
65 |
+
Psutka, Josef V. and
|
66 |
+
Psutka, Josef
|
67 |
},
|
68 |
+
title = {{Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak}},
|
|
|
69 |
year = {2023},
|
70 |
+
isbn = {978-3-031-40497-9},
|
71 |
+
publisher = {Springer Nature Switzerland},
|
72 |
+
address = {Cham},
|
73 |
+
url = {https://doi.org/10.1007/978-3-031-40498-6_29},
|
74 |
+
doi = {10.1007/978-3-031-40498-6_29},
|
75 |
+
booktitle = {Text, Speech, and Dialogue: 26th International Conference, TSD 2023, Pilsen, Czech Republic, September 4–6, 2023, Proceedings},
|
76 |
+
pages = {328–338},
|
77 |
+
numpages = {11},
|
78 |
}
|
79 |
```
|
80 |
|
81 |
## Related papers
|
82 |
- [INTERSPEECH 2022 - Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech](https://www.isca-speech.org/archive/pdfs/interspeech_2022/lehecka22_interspeech.pdf)
|
83 |
+
- [INTERSPEECH 2023 - Transformer-based Speech Recognition Models for Oral History Archives in English, German, and Czech](https://www.isca-archive.org/interspeech_2023/lehecka23_interspeech.pdf)
|
84 |
|
85 |
## Related models
|
86 |
- [fav-kky/wav2vec2-base-cs-80k-ClTRUS](https://huggingface.co/fav-kky/wav2vec2-base-cs-80k-ClTRUS)
|