issai
/

tilmash

text2text-generation

Model card Files Files and versions Community

tilmash / README.md

ardakshalkar's picture

Updated link to ACL repo

9105dde verified about 2 months ago

|

3.19 kB

	---
	language:
	- kk
	- tr
	- ru
	- en
	language_details: eng_Latn, kaz_Cyrl, rus_Cyrl, tur_Latn
	metrics:
	- bleu
	- chrf
	pipeline_tag: translation
	inference: false
	datasets:
	- facebook/flores
	- issai/kazparc
	---

	# Tilmash

	<p align = "justify">
	Tilmash was fine-tuned using Facebook’s <a href = "https://huggingface.co/facebook/nllb-200-distilled-1.3B">NLLB</a> model to enable machine translation for four languages—Kazakh, Russian, English, and Turkish.
	Below are the <a href = "https://huggingface.co/spaces/evaluate-metric/bleu">BLEU</a> \| <a href = "https://huggingface.co/spaces/evaluate-metric/chrf">chrF</a> results of evaluating Tilmash on the <a href = "https://huggingface.co/datasets/facebook/flores">FLoRes</a> and <a href = "https://huggingface.co/datasets/issai/kazparc">KazParC</a> test datasets.
	</p>

	<table align = "center">
	<thead align = "center">
	<tr>
	<th>Pair</th>
	<th>FLoRes</th>
	<th>KazParC</th>
	</tr>
	</thead>
	<tbody align = "center">
	<tr>
	<td>EN↔KK</td>
	<td>0.20 \| 0.60</td>
	<td>0.21 \| 0.60</td>
	</tr>
	<tr>
	<td>EN↔RU</td>
	<td>0.28 \| 0.60</td>
	<td>0.38 \| 0.68</td>
	</tr>
	<tr>
	<td>EN↔TR</td>
	<td>0.27 \| 0.65</td>
	<td>0.25 \| 0.64</td>
	</tr>
	<tr>
	<td>KK↔EN</td>
	<td>0.32 \| 0.63</td>
	<td>0.32 \| 0.62</td>
	</tr>
	<tr>
	<td>KK↔RU</td>
	<td>0.18 \| 0.52</td>
	<td>0.29 \| 0.63</td>
	</tr>
	<tr>
	<td>KK↔TR</td>
	<td>0.14 \| 0.54</td>
	<td>0.16 \| 0.55</td>
	</tr>
	<tr>
	<td>RU↔EN</td>
	<td>0.32 \| 0.63</td>
	<td>0.42 \| 0.70</td>
	</tr>
	<tr>
	<td>RU↔KK</td>
	<td>0.13 \| 0.54</td>
	<td>0.22 \| 0.62</td>
	</tr>
	<tr>
	<td>RU↔TR</td>
	<td>0.14 \| 0.54</td>
	<td>0.18 \| 0.57</td>
	</tr>
	<tr>
	<td>TR↔EN</td>
	<td>0.36 \| 0.66</td>
	<td>0.38 \| 0.66</td>
	</tr>
	<tr>
	<td>TR↔KK</td>
	<td>0.13 \| 0.54</td>
	<td>0.16 \| 0.55</td>
	</tr>
	<tr>
	<td>TR↔RU</td>
	<td>0.19 \| 0.53</td>
	<td>0.24 \| 0.57</td>
	</tr>
	</tbody>
	</table>

	## Model Sources

	- Repository: <a href = "https://github.com/IS2AI/KazParC">https://github.com/IS2AI/KazParC</a>
	- Paper: <a href = "https://aclanthology.org/2024.lrec-main.842.pdf">KazParC: Kazakh Parallel Corpus for Machine Translation</a>
	- Demo: <a href = "https://issai.nu.edu.kz/tilmash/">Tilmash Demo</a>

	## How to Get Started with the Model

	<p align = "justify">You can use this model with the Transformers pipeline for translation.</p>

	```python
	from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TranslationPipeline

	model = AutoModelForSeq2SeqLM.from_pretrained('issai/tilmash')
	tokenizer = AutoTokenizer.from_pretrained("issai/tilmash")

	# for src_lang and tgt_lang choose from kaz_Cyrl (Kazakh), rus_Cyrl (Russian), eng_Latn (English), tur_Latn (Turkish)
	tilmash = TranslationPipeline(model = model, tokenizer = tokenizer, src_lang = "kaz_Cyrl", tgt_lang = "eng_Latn", max_length = 1000)

	print(tilmash("Қазақстан — Шығыс Еуропа мен Орталық Азияда орналасқан мемлекет."))
	# [{'translation_text': 'Kazakhstan is a country located in Eastern Europe and Central Asia.'}]
	```