--- {} --- This repo contains an in-house tuned LLaMA-7b based on the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) dataset, for only research use. Quantitative evaluation on machine translation and qualitative comparison on general abilities can be found at [alpaca-mt](https://github.com/wxjiao/alpaca-mt).
Translation Performance of LLMs on Flores Subsets. | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Direction | De-En | En-De | Zh-En | En-Zh | |||||||
Metric | BLEU | COMET | BLEU | COMET | BLEU | COMET | BLEU | COMET | |||
45.04 | 0.8879 | 41.16 | 0.8861 | 31.66 | 0.8771 | 43.58 | 0.8842 | ||||
DeepL | 49.23 | 0.8970 | 41.46 | 0.8903 | 31.22 | 0.8739 | 44.31 | 0.8811 | |||
ChatGPT | 43.71 | 0.8910 | 38.87 | 0.8814 | 24.73 | 0.8581 | 38.27 | 0.8699 | |||
GPT-4 | 46.00 | 0.8931 | 45.73 | 0.8928 | 28.50 | 0.8742 | 42.50 | 0.8840 | |||
LLaMA-7b | 6.96 | 0.6548 | 3.64 | 0.5084 | 8.95 | 0.6340 | 0.10 | 0.4899 | |||
Alpaca-7b | 36.00 | 0.8737 | 20.09 | 0.8003 | 14.37 | 0.8069 | 10.06 | 0.5604 |