NLLB 600M TH-EN finetuned

This model is finetuned from facebook/nllb-200-distilled-600M using SCB-1M and OPUS dataset. The finetuning script is on GitHub.
View full finetuning logs on wandb.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
import torch

MODEL_NAME = "wtarit/nllb-600M-th-en"

model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
device = 0 if torch.cuda.is_available() else "cpu"

translation_pipeline = pipeline(
    "translation", 
    model=model, 
    tokenizer=tokenizer, 
    src_lang="tha_Thai", 
    tgt_lang="eng_Latn", 
    max_length=400, 
    device=device
)

# Run translation pipeline
result = translation_pipeline("สวัสดี เราคือโมเดลแปลภาษา")
print(result[0]['translation_text'])

Score

BLEU Score (Using sacrebleu): 27.37 on IWSLT 2015

wtarit
/

nllb-600M-th-en

NLLB 600M TH-EN finetuned

Usage

Score

Space using wtarit/nllb-600M-th-en 1