vasista22
/

ccc-wav2vec2-base

Inference Endpoints

Model card Files Files and versions Community

ccc-wav2vec2-base / README.md

vasista22's picture

first commit

529cb86 almost 2 years ago

|

history blame contribute delete

1.97 kB

	---
	language: en
	datasets:
	- librispeech_asr
	tags:
	- speech
	---

	# ccc-Wav2Vec2-Base (Pre-trained on LibriSpeech-960h)

	The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

	Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model.

	[Paper](https://arxiv.org/abs/2210.02592)

	Authors: Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh

	Abstract
	While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered. We present a new pre-training strategy named ccc-wav2vec 2.0, which uses clustering and an augmentation-based cross-contrastive loss as its self-supervised objective. Through the clustering module, we scale down the influence of those negative examples that are highly similar to the positive. The Cross-Contrastive loss is computed between the encoder output of the original sample and the quantizer output of its augmentation and vice-versa, bringing robustness to the pre-training strategy. ccc-wav2vec 2.0 achieves up to 15.6% and 12.7% relative WER improvement over the baseline wav2vec 2.0 on the test-clean and test-other sets, respectively, of LibriSpeech, without the use of any language model. The proposed method also achieves up to 14.9% relative WER improvement over the baseline wav2vec 2.0 when fine-tuned on Switchboard data.
	GitHub Page: https://github.com/speech-lab-iitm/ccc-wav2vec-2.0.

	# Usage

	See [this notebook](https://colab.research.google.com/drive/1FjTsqbYKphl9kL-eILgUc-bl4zVThL8F?usp=sharing) for more information on how to fine-tune the model.