pretraining dataset is Libri-Light, not LibriSpeech
#2
by
gaunernst
- opened
As per paper (https://arxiv.org/pdf/2202.03555), table 2, data2vec audio large was pre-trained on Libri-Light.
GitHub page (https://github.com/facebookresearch/fairseq/tree/main/examples/data2vec) also shows that large variants were pre-trained on Libri-Light.
Datasets tag should be updated accordingly.