Edit model card

Model Name

This is a multilingually fine-tuned version of NLLB based on nllb-200-distilled-600M using the text data of CoVoST2 (En -> 15).

It is part of the paper Pushing the Limits of Zero-shot End-to-end Speech Translation. Details for the fine-tuning process are available at Appendix D.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("johntsi/nllb-200-distilled-600M_covost2_en-to-15")
model = AutoModelForSeq2SeqLM.from_pretrained("johntsi/nllb-200-distilled-600M_covost2_en-to-15")

model.eval()
model.to("cuda")

text = "Translate this text to German."
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
    **inputs,
    num_beams=5,
    forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"]
)
translated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translated_text)

Results

BLEU scores on CoVoST2 test

Model Ar Ca Cy De Et Fa Id Ja Lv Mn Sl Sv Ta Tr Zh Average
nllb-200-distilled-600M (original) 20.0 39.0 26.3 35.5 23.4 15.7 39.6 21.8 14.8 10.4 30.3 41.1 20.2 21.1 34.8 26.3
nllb-200-distilled-600M_covost2_en-to-15 28.5 46.3 35.5 37.1 31.5 29.2 45.2 38.4 29.1 22.0 37.7 45.4 29.9 23.0 46.7 35.0
nllb-200-distilled-1.3B (original) 23.3 43.5 33.5 37.9 27.9 16.6 41.9 23.0 20.0 13.1 35.1 43.8 21.7 23.8 37.5 29.5
nllb-200-distilled-1.3B_covost2_en-to-15 29.9 47.8 35.6 38.8 32.7 29.9 46.4 39.5 29.9 21.7 39.3 46.8 31.0 24.4 48.2 36.1

Citation

If you find these models useful for your research, please cite our paper :)

@inproceedings{tsiamas-etal-2024-pushing,
    title = {{Pushing the Limits of Zero-shot End-to-End Speech Translation}},
    author = "Tsiamas, Ioannis  and
      G{\'a}llego, Gerard  and
      Fonollosa, Jos{\'e}  and
      Costa-juss{\`a}, Marta",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand and virtual meeting",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.847",
    pages = "14245--14267",
}
Downloads last month
4
Safetensors
Model size
615M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train johntsi/nllb-200-distilled-600M_covost2_en-to-15

Collection including johntsi/nllb-200-distilled-600M_covost2_en-to-15