Massively Multilingual Speech (MMS) - Finetuned ASR - ALL

This is a checkpoint of MMS Zero-shot project, a model to transcribe the speech of almost any language using only a small amount of unlabeled text in the new language. The approach is based on a multilingual acoustic model trained on data in 1,150 languages (leveraging the data of MMS) which outputs transcriptions in an intermediate representation (uroman tokens). A small amount of text in the new, unseen language is then also mapped to the this intermediate representation and at infernce time, this mapping, with an optional language model, enables transcribing a new language.

Example

Please have a look at the official space for an example on using the model.

Model details

Developed by: Jinming Zhao et al.
Model type: Scaling A Simple Approach to Zero-Shot Speech Recognition
License: CC-BY-NC 4.0 license
Num parameters: 300 million

Cite as:

@article{zhao2024scaling,
  title={Scaling A Simple Approach to Zero-Shot Speech Recognition},
  author={Zhao, Jinming and Pratap, Vineel and Auli, Michael},
  journal={arXiv preprint arXiv:2407.17852},
  year={2024}
}

mms-meta
/

mms-zeroshot-300m

Massively Multilingual Speech (MMS) - Finetuned ASR - ALL

Table Of Content

Example

Model details

Additional Links

Model tree for mms-meta/mms-zeroshot-300m

Datasets used to train mms-meta/mms-zeroshot-300m

Spaces using mms-meta/mms-zeroshot-300m 2