PeacefulData
/

GenTranslate

Text Generation

generative translation

large language model

Model card Files Files and versions Community

GenTranslate / README.md

yuchen005's picture

Update README.md

8b728fc verified 6 months ago

|

history blame contribute delete

1.64 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	- ja
	- fr
	- es
	- it
	- pt
	tags:
	- generative translation
	- large language model
	- LLaMA
	metrics:
	- bleu
	pipeline_tag: text-generation
	datasets:
	- PeacefulData/HypoTranslate
	---
	This repo releases the trained LLaMA-adapter weights in paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators".

	Code: https://github.com/YUCHEN005/GenTranslate

	Data: https://huggingface.co/datasets/PeacefulData/HypoTranslate

	Model: This repo

	*Filename format:* [data\_source]\_[src\_language\_code]\_[tgt\_language\_code]\_[task].pth

	e.g. covost2_ar_en_st.pth

	*Note:*
	- Language code look-up: Table 15 & 17 in https://arxiv.org/pdf/2402.06894.pdf
	- Source/target language refers to the translation task, so that the N-best hypotheses and ground-truth transcription are both in target language
	- For speech translation datasets (FLEURS, CoVoST-2, MuST-C), the task ID "mt" denotes cascaded ASR+MT system


	If you consider this work would be related or useful for your research, please kindly consider to cite the work below. Thank you.

	```bib
	@inproceedings{hu2024gentranslate,
	title = "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators",
	author = "Hu, Yuchen and Chen, Chen and Yang, Chao-Han Huck and Li, Ruizhe and Zhang, Dong and Chen, Zhehuai and Chng, Eng Siong",
	booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
	publisher = "Association for Computational Linguistics",
	year = "2024"
	}
	```