File size: 1,017 Bytes
14294e4
a8b229a
 
 
 
14294e4
a8b229a
 
 
 
 
 
 
 
 
 
 
 
27f0228
a8b229a
 
 
 
 
 
 
 
a0e0329
 
 
 
 
 
 
a8b229a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: apache-2.0
language:
- yue
library_name: transformers
---

# Cantonese Wav2Vec2-Conformer-Base with Relative Position Embeddings

wav2vec 2.0 Conformer with relative position embeddings, pretrained on 
2.8K hours of Cantonese spontaneous speech data sampled at 16kHz.

Note: This model has not been fine-tuned on labeled text data.


## Alternative Version

An alternative version of the model which was pre-trained on the same dataset but
with setting `layer_norm_first` to `false` is available [here](https://drive.google.com/file/d/1rbP-6pZfR5ieqAwd5_X2KzipLuKpXSsQ/view?usp=sharing)
as a fairseq checkpoint and may give better downstream results.


## Citation

Please cite the following paper if you use the model.

```
@inproceedings{huang23h_interspeech,
  author={Ranzo Huang and Brian Mak},
  title={{wav2vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={4958--4962},
  doi={10.21437/Interspeech.2023-2470}
}
```