README.md · LIA-AvignonUniversity/IWSLT2022-Niger-Mali at main

Model and data descriptions

This is a wav2vec 2.0 base model trained on the Niger-Mali audio collection and on the Tamasheq-French speech corpus. These combined contained 111 hours of French, 109 hours of Fulfulde, 100 hours of Hausa, 243 hours of Tamasheq and 95 hours of Zarma. These corpora were presented in Boito et al., 2022.

Intended uses & limitations

Pretrained wav2vec2 models are distributed under the Apache-2.0 license. Hence, they can be reused extensively without strict limitations.

Referencing our IWSLT models

@article{boito2022trac,
  title={ON-TRAC Consortium Systems for the IWSLT 2022 Dialect and Low-resource Speech Translation Tasks},
  author={Boito, Marcely Zanon and Ortega, John and Riguidel, Hugo and Laurent, Antoine and Barrault, Lo{\"\i}c and Bougares, Fethi and Chaabani, Firas and Nguyen, Ha and Barbier, Florentin and Gahbiche, Souhir and others},
  journal={IWSLT},
  year={2022}
}