File size: 1,638 Bytes
c686bad 7751f9b c686bad 7751f9b c686bad 8b728fc a0fc6fc 192e10e 94ab6e7 192e10e c686bad d70fe58 9065b7b d70fe58 c686bad |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
license: apache-2.0
language:
- en
- zh
- ja
- fr
- es
- it
- pt
tags:
- generative translation
- large language model
- LLaMA
metrics:
- bleu
pipeline_tag: text-generation
datasets:
- PeacefulData/HypoTranslate
---
This repo releases the trained LLaMA-adapter weights in paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators".
**Code:** https://github.com/YUCHEN005/GenTranslate
**Data:** https://huggingface.co/datasets/PeacefulData/HypoTranslate
**Model:** This repo
***Filename format:*** [data\_source]\_[src\_language\_code]\_[tgt\_language\_code]\_[task].pth
e.g. covost2_ar_en_st.pth
***Note:***
- Language code look-up: Table 15 & 17 in https://arxiv.org/pdf/2402.06894.pdf
- Source/target language refers to the translation task, so that the N-best hypotheses and ground-truth transcription are both in target language
- For speech translation datasets (FLEURS, CoVoST-2, MuST-C), the task ID "mt" denotes cascaded ASR+MT system
If you consider this work would be related or useful for your research, please kindly consider to cite the work below. Thank you.
```bib
@inproceedings{hu2024gentranslate,
title = "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators",
author = "Hu, Yuchen and Chen, Chen and Yang, Chao-Han Huck and Li, Ruizhe and Zhang, Dong and Chen, Zhehuai and Chng, Eng Siong",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
publisher = "Association for Computational Linguistics",
year = "2024"
}
``` |