minhtoan
/

t5-translate-vietnamese-lao

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

t5-translate-vietnamese-lao / README.md

minhtoan's picture

Update README.md

4c0c1bd verified 10 months ago

|

history blame contribute delete

3.11 kB

	---
	language:
	- vi
	- lo
	tags:
	- translation
	license: mit
	widget:
	- text: "Tôi muốn mua một cuốn sách"
	inference:
	parameters:
	max_length: 200
	pipeline_tag: translation
	library_name: transformers
	---
	# Vietnamese to Lao Translation Model
	In the domain of natural language processing (NLP), the development of translation models tailored for low-resource languages represents a critical endeavor to facilitate cross-cultural communication and knowledge exchange. In response to this challenge, we present a novel and impactful contribution: a translation model specifically designed to bridge the linguistic gap between Lao and Vietnamese.

	Lao, a language spoken primarily in Laos and parts of Thailand, presents inherent challenges for machine translation due to its low-resource nature, characterized by limited parallel corpora and linguistic resources. Vietnamese, a language spoken by millions worldwide, shares some linguistic similarities with Lao, making it an ideal target language for translation purposes.

	Leveraging the power of the Transformer-based T5 model, we have developed a robust translation system for the Vietnamese-Lao language pair. The T5 model, renowned for its versatility and effectiveness across various NLP tasks, serves as the cornerstone of our approach. Through fine-tuning on a curated dataset of Lao-Vietnamese parallel texts, we have endeavored to enhance translation accuracy and fluency, thus enabling smoother communication between speakers of these languages.

	Our work represents a significant advancement in the field of machine translation, particularly for low-resource languages like Lao. By harnessing state-of-the-art NLP techniques and focusing on the specific linguistic nuances of the Lao-Vietnamese language pair, we aim to provide a valuable resource for facilitating cross-linguistic communication and cultural exchange.
	## How to use
	### On GPU
	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	tokenizer = AutoTokenizer.from_pretrained("minhtoan/t5-translate-vietnamese-lao")
	model = AutoModelForSeq2SeqLM.from_pretrained("minhtoan/t5-translate-vietnamese-lao")
	model.cuda()
	src = "Tôi muốn mua một cuốn sách"
	tokenized_text = tokenizer.encode(src, return_tensors="pt").cuda()
	model.eval()
	translate_ids = model.generate(tokenized_text, max_length=200)
	output = tokenizer.decode(translate_ids[0], skip_special_tokens=True)
	output
	```
	'ຂ້ອຍຢາກຊື້ປຶ້ມ'

	### On CPU
	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	tokenizer = AutoTokenizer.from_pretrained("minhtoan/t5-translate-vietnamese-lao")
	model = AutoModelForSeq2SeqLM.from_pretrained("minhtoan/t5-translate-vietnamese-lao")
	src = "Tôi muốn mua một cuốn sách"
	input_ids = tokenizer(src, max_length=200, return_tensors="pt", padding="max_length", truncation=True).input_ids
	outputs = model.generate(input_ids=input_ids, max_new_tokens=200)
	output = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
	output
	```
	'ຂ້ອຍຢາກຊື້ປຶ້ມ'



	## Author
	`
	Phan Minh Toan
	`