tilmash / README.md
yeshpanovrustem's picture
Update README.md
c905365
metadata
language:
  - kk
  - tr
  - ru
  - en
language_details: eng_Latn, kaz_Cyrl, rus_Cyrl, tur_Latn
metrics:
  - bleu
  - chrf
pipeline_tag: translation
inference: false
datasets:
  - facebook/flores
  - issai/kazparc

Tilmash

Tilmash was fine-tuned using Facebook’s NLLB model to enable machine translation for four languages—Kazakh, Russian, English, and Turkish. Below are the BLEU | chrF results of evaluating Tilmash on the FLoRes and KazParC test datasets.

Pair FLoRes KazParC
EN↔KK 0.20 | 0.60 0.21 | 0.60
EN↔RU 0.28 | 0.60 0.38 | 0.68
EN↔TR 0.27 | 0.65 0.25 | 0.64
KK↔EN 0.32 | 0.63 0.32 | 0.62
KK↔RU 0.18 | 0.52 0.29 | 0.63
KK↔TR 0.14 | 0.54 0.16 | 0.55
RU↔EN 0.32 | 0.63 0.42 | 0.70
RU↔KK 0.13 | 0.54 0.22 | 0.62
RU↔TR 0.14 | 0.54 0.18 | 0.57
TR↔EN 0.36 | 0.66 0.38 | 0.66
TR↔KK 0.13 | 0.54 0.16 | 0.55
TR↔RU 0.19 | 0.53 0.24 | 0.57

Model Sources

How to Get Started with the Model

You can use this model with the Transformers pipeline for translation.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TranslationPipeline

model = AutoModelForSeq2SeqLM.from_pretrained('issai/tilmash')
tokenizer = AutoTokenizer.from_pretrained("issai/tilmash")

# for src_lang and tgt_lang choose from kaz_Cyrl (Kazakh), rus_Cyrl (Russian), eng_Latn (English), tur_Latn (Turkish)
tilmash = TranslationPipeline(model = model, tokenizer = tokenizer, src_lang = "kaz_Cyrl", tgt_lang = "eng_Latn", max_length = 1000)

print(tilmash("Қазақстан — Шығыс Еуропа мен Орталық Азияда орналасқан мемлекет."))
# [{'translation_text': 'Kazakhstan is a country located in Eastern Europe and Central Asia.'}]