metadata

language:
  - kk
  - tr
  - ru
  - en
language_details: eng_Latn, kaz_Cyrl, rus_Cyrl, tur_Latn
metrics:
  - bleu
  - chrf
pipeline_tag: translation
inference: false
datasets:
  - facebook/flores
  - issai/kazparc

Tilmash

Tilmash was fine-tuned using Facebook’s NLLB model to enable machine translation for four languages—Kazakh, Russian, English, and Turkish. Below are the BLEU | chrF results of evaluating Tilmash on the FLoRes and KazParC test datasets.

Pair	FLoRes	KazParC
EN↔KK	0.20 \| 0.60	0.21 \| 0.60
EN↔RU	0.28 \| 0.60	0.38 \| 0.68
EN↔TR	0.27 \| 0.65	0.25 \| 0.64
KK↔EN	0.32 \| 0.63	0.32 \| 0.62
KK↔RU	0.18 \| 0.52	0.29 \| 0.63
KK↔TR	0.14 \| 0.54	0.16 \| 0.55
RU↔EN	0.32 \| 0.63	0.42 \| 0.70
RU↔KK	0.13 \| 0.54	0.22 \| 0.62
RU↔TR	0.14 \| 0.54	0.18 \| 0.57
TR↔EN	0.36 \| 0.66	0.38 \| 0.66
TR↔KK	0.13 \| 0.54	0.16 \| 0.55
TR↔RU	0.19 \| 0.53	0.24 \| 0.57

Model Sources

Repository: https://github.com/IS2AI/KazParC
Paper: KazParC: Kazakh Parallel Corpus for Machine Translation
Demo: Tilmash Demo

How to Get Started with the Model

You can use this model with the Transformers pipeline for translation.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TranslationPipeline

model = AutoModelForSeq2SeqLM.from_pretrained('issai/tilmash')
tokenizer = AutoTokenizer.from_pretrained("issai/tilmash")

# for src_lang and tgt_lang choose from kaz_Cyrl (Kazakh), rus_Cyrl (Russian), eng_Latn (English), tur_Latn (Turkish)
tilmash = TranslationPipeline(model = model, tokenizer = tokenizer, src_lang = "kaz_Cyrl", tgt_lang = "eng_Latn", max_length = 1000)

print(tilmash("Қазақстан — Шығыс Еуропа мен Орталық Азияда орналасқан мемлекет."))
# [{'translation_text': 'Kazakhstan is a country located in Eastern Europe and Central Asia.'}]

Pair	FLoRes	KazParC
EN↔KK	0.20 \| 0.60	0.21 \| 0.60
EN↔RU	0.28 \| 0.60	0.38 \| 0.68
EN↔TR	0.27 \| 0.65	0.25 \| 0.64
KK↔EN	0.32 \| 0.63	0.32 \| 0.62
KK↔RU	0.18 \| 0.52	0.29 \| 0.63
KK↔TR	0.14 \| 0.54	0.16 \| 0.55
RU↔EN	0.32 \| 0.63	0.42 \| 0.70
RU↔KK	0.13 \| 0.54	0.22 \| 0.62
RU↔TR	0.14 \| 0.54	0.18 \| 0.57
TR↔EN	0.36 \| 0.66	0.38 \| 0.66
TR↔KK	0.13 \| 0.54	0.16 \| 0.55
TR↔RU	0.19 \| 0.53	0.24 \| 0.57