File size: 2,116 Bytes
a92665f
8910259
a92665f
7d4c45f
f9c40e0
19b681d
600b4cc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1769a95
600b4cc
 
 
 
 
 
 
 
 
1769a95
600b4cc
 
 
 
45ebea1
 
 
 
 
 
 
 
600b4cc
 
 
45ebea1
 
 
 
 
 
 
 
600b4cc
 
 
45ebea1
 
 
 
 
 
 
 
600b4cc
 
 
45ebea1
 
 
 
 
 
 
 
600b4cc
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
{}
---

This repo contains an in-house tuned LLaMA-7b based on the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) dataset.

Quantitative evaluation on machine translation and qualitative comparison on general abilities can be found at [alpaca-mt](https://github.com/wxjiao/alpaca-mt)



<div class="max-w-full overflow-auto">
<table>
<tr>
<th colspan="12" align="center">Translation Performance of LLMs on Flores <a style="font-weight:bold" href=https://github.com/wxjiao/Is-ChatGPT-A-Good-Translator>Subsets</a>.
</tr>
<tr align="center" style="font-weight:bold">
<td>Direction</td>
<td colspan="2">De-En</td>
<td colspan="2">En-De</td>
<td colspan="2">Zh-En</td>
<td colspan="2">En-Zh</td>
</tr>
<tr align="center" style="font-weight:bold">
<td>Metric</td>
<td>BLEU</td>
<td>COMET</td>
<td>BLEU</td>
<td>COMET</td>
<td>BLEU</td>
<td>COMET</td>
<td>BLEU</td>
<td>COMET</td>
</tr>
<tr align="center">
<td>Google</td>
<td>45.04</td>
<td>0.8879</td>
<td>41.16</td>
<td>0.8861</td>
<td style="font-weight:bold">31.66</td>
<td style="font-weight:bold">0.8771</td>
<td>43.58</td>
<td style="font-weight:bold">0.8842</td>
</tr>
<tr align="center">
<td>DeepL</td>
<td style="font-weight:bold">49.23</td>
<td style="font-weight:bold">0.8970</td>
<td>41.46</td>
<td>0.8903</td>
<td>31.22</td>
<td>0.8739</td>
<td style="font-weight:bold">44.31</td>
<td>0.8811</td>
</tr>
<tr align="center">
<td>ChatGPT</td>
<td>43.71</td>
<td>0.8910</td>
<td>38.87</td>
<td>0.8814</td>
<td>24.73</td>
<td>0.8581</td>
<td>38.27</td>
<td>0.8699</td>
</tr>
<tr align="center">
<td>GPT-4</td>
<td>46.00</td>
<td>0.8931</td>
<td style="font-weight:bold">45.73</td>
<td style="font-weight:bold">0.8928</td>
<td>28.50</td>
<td>0.8742</td>
<td>42.50</td>
<td>0.8840</td>
</tr>
<tr align="center">
<td>LLaMA-7b</td>
<td>6.96</td>
<td>0.6548</td>
<td>3.64</td>
<td>0.5084</td>
<td>8.95</td>
<td>0.6340</td>
<td>0.10</td>
<td>0.4899</td>
</tr>
<tr align="center">
<td>Alpaca-7b</td>
<td>36.00</td>
<td>0.8737</td>
<td>20.09</td>
<td>0.8003</td>
<td>14.37</td>
<td>0.8069</td>
<td>10.06</td>
<td>0.5604</td>
</tr>
</table>
</div>