|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- zh |
|
- ja |
|
- fr |
|
- es |
|
- it |
|
- pt |
|
tags: |
|
- generative translation |
|
- large language model |
|
- LLaMA |
|
metrics: |
|
- bleu |
|
pipeline_tag: text-generation |
|
datasets: |
|
- PeacefulData/HypoTranslate |
|
--- |
|
This repo releases the trained LLaMA-adapter weights in paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators". |
|
|
|
**Code:** https://github.com/YUCHEN005/GenTranslate |
|
|
|
**Data:** https://huggingface.co/datasets/PeacefulData/HypoTranslate |
|
|
|
**Model:** This repo |
|
|
|
***Filename format:*** [data\_source]\_[src\_language\_code]\_[tgt\_language\_code]\_[task].pth |
|
|
|
e.g. covost2_ar_en_st.pth |
|
|
|
***Note:*** |
|
- Language code look-up: Table 15 & 17 in https://arxiv.org/pdf/2402.06894.pdf |
|
- Source/target language refers to the translation task, so that the N-best hypotheses and ground-truth transcription are both in target language |
|
- For speech translation datasets (FLEURS, CoVoST-2, MuST-C), the task ID "mt" denotes cascaded ASR+MT system |
|
|
|
|
|
If you consider this work would be related or useful for your research, please kindly consider to cite the work below. Thank you. |
|
|
|
```bib |
|
@inproceedings{hu2024gentranslate, |
|
title = "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators", |
|
author = "Hu, Yuchen and Chen, Chen and Yang, Chao-Han Huck and Li, Ruizhe and Zhang, Dong and Chen, Zhehuai and Chng, Eng Siong", |
|
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", |
|
publisher = "Association for Computational Linguistics", |
|
year = "2024" |
|
} |
|
``` |