File size: 1,017 Bytes
14294e4 a8b229a 14294e4 a8b229a 27f0228 a8b229a a0e0329 a8b229a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
license: apache-2.0
language:
- yue
library_name: transformers
---
# Cantonese Wav2Vec2-Conformer-Base with Relative Position Embeddings
wav2vec 2.0 Conformer with relative position embeddings, pretrained on
2.8K hours of Cantonese spontaneous speech data sampled at 16kHz.
Note: This model has not been fine-tuned on labeled text data.
## Alternative Version
An alternative version of the model which was pre-trained on the same dataset but
with setting `layer_norm_first` to `false` is available [here](https://drive.google.com/file/d/1rbP-6pZfR5ieqAwd5_X2KzipLuKpXSsQ/view?usp=sharing)
as a fairseq checkpoint and may give better downstream results.
## Citation
Please cite the following paper if you use the model.
```
@inproceedings{huang23h_interspeech,
author={Ranzo Huang and Brian Mak},
title={{wav2vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
pages={4958--4962},
doi={10.21437/Interspeech.2023-2470}
}
``` |