tilmash / README.md
ardakshalkar's picture
Updated link to ACL repo
9105dde verified
|
raw
history blame
3.19 kB
---
language:
- kk
- tr
- ru
- en
language_details: eng_Latn, kaz_Cyrl, rus_Cyrl, tur_Latn
metrics:
- bleu
- chrf
pipeline_tag: translation
inference: false
datasets:
- facebook/flores
- issai/kazparc
---
# Tilmash
<p align = "justify">
Tilmash was fine-tuned using Facebook’s <a href = "https://huggingface.co/facebook/nllb-200-distilled-1.3B">NLLB</a> model to enable machine translation for four languages—Kazakh, Russian, English, and Turkish.
Below are the <a href = "https://huggingface.co/spaces/evaluate-metric/bleu">BLEU</a> | <a href = "https://huggingface.co/spaces/evaluate-metric/chrf">chrF</a> results of evaluating Tilmash on the <a href = "https://huggingface.co/datasets/facebook/flores">FLoRes</a> and <a href = "https://huggingface.co/datasets/issai/kazparc">KazParC</a> test datasets.
</p>
<table align = "center">
<thead align = "center">
<tr>
<th>Pair</th>
<th>FLoRes</th>
<th>KazParC</th>
</tr>
</thead>
<tbody align = "center">
<tr>
<td>EN↔KK</td>
<td>0.20 | 0.60</td>
<td>0.21 | 0.60</td>
</tr>
<tr>
<td>EN↔RU</td>
<td>0.28 | 0.60</td>
<td>0.38 | 0.68</td>
</tr>
<tr>
<td>EN↔TR</td>
<td>0.27 | 0.65</td>
<td>0.25 | 0.64</td>
</tr>
<tr>
<td>KK↔EN</td>
<td>0.32 | 0.63</td>
<td>0.32 | 0.62</td>
</tr>
<tr>
<td>KK↔RU</td>
<td>0.18 | 0.52</td>
<td>0.29 | 0.63</td>
</tr>
<tr>
<td>KK↔TR</td>
<td>0.14 | 0.54</td>
<td>0.16 | 0.55</td>
</tr>
<tr>
<td>RU↔EN</td>
<td>0.32 | 0.63</td>
<td>0.42 | 0.70</td>
</tr>
<tr>
<td>RU↔KK</td>
<td>0.13 | 0.54</td>
<td>0.22 | 0.62</td>
</tr>
<tr>
<td>RU↔TR</td>
<td>0.14 | 0.54</td>
<td>0.18 | 0.57</td>
</tr>
<tr>
<td>TR↔EN</td>
<td>0.36 | 0.66</td>
<td>0.38 | 0.66</td>
</tr>
<tr>
<td>TR↔KK</td>
<td>0.13 | 0.54</td>
<td>0.16 | 0.55</td>
</tr>
<tr>
<td>TR↔RU</td>
<td>0.19 | 0.53</td>
<td>0.24 | 0.57</td>
</tr>
</tbody>
</table>
## Model Sources
- **Repository:** <a href = "https://github.com/IS2AI/KazParC">https://github.com/IS2AI/KazParC</a>
- **Paper:** <a href = "https://aclanthology.org/2024.lrec-main.842.pdf">KazParC: Kazakh Parallel Corpus for Machine Translation</a>
- **Demo:** <a href = "https://issai.nu.edu.kz/tilmash/">Tilmash Demo</a>
## How to Get Started with the Model
<p align = "justify">You can use this model with the Transformers pipeline for translation.</p>
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TranslationPipeline
model = AutoModelForSeq2SeqLM.from_pretrained('issai/tilmash')
tokenizer = AutoTokenizer.from_pretrained("issai/tilmash")
# for src_lang and tgt_lang choose from kaz_Cyrl (Kazakh), rus_Cyrl (Russian), eng_Latn (English), tur_Latn (Turkish)
tilmash = TranslationPipeline(model = model, tokenizer = tokenizer, src_lang = "kaz_Cyrl", tgt_lang = "eng_Latn", max_length = 1000)
print(tilmash("Қазақстан — Шығыс Еуропа мен Орталық Азияда орналасқан мемлекет."))
# [{'translation_text': 'Kazakhstan is a country located in Eastern Europe and Central Asia.'}]
```