system HF staff commited on
Commit
38f0e3e
1 Parent(s): d24c931

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -10
README.md CHANGED
@@ -1,24 +1,161 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - translation
 
 
4
  ---
5
 
6
- ### opus-mt-ROMANCE-en
7
 
8
- * source languages: fr,fr_BE,fr_CA,fr_FR,wa,frp,oc,ca,rm,lld,fur,lij,lmo,es,es_AR,es_CL,es_CO,es_CR,es_DO,es_EC,es_ES,es_GT,es_HN,es_MX,es_NI,es_PA,es_PE,es_PR,es_SV,es_UY,es_VE,pt,pt_br,pt_BR,pt_PT,gl,lad,an,mwl,it,it_IT,co,nap,scn,vec,sc,ro,la
9
- * target languages: en
10
- * OPUS readme: [fr+fr_BE+fr_CA+fr_FR+wa+frp+oc+ca+rm+lld+fur+lij+lmo+es+es_AR+es_CL+es_CO+es_CR+es_DO+es_EC+es_ES+es_GT+es_HN+es_MX+es_NI+es_PA+es_PE+es_PR+es_SV+es_UY+es_VE+pt+pt_br+pt_BR+pt_PT+gl+lad+an+mwl+it+it_IT+co+nap+scn+vec+sc+ro+la-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/fr+fr_BE+fr_CA+fr_FR+wa+frp+oc+ca+rm+lld+fur+lij+lmo+es+es_AR+es_CL+es_CO+es_CR+es_DO+es_EC+es_ES+es_GT+es_HN+es_MX+es_NI+es_PA+es_PE+es_PR+es_SV+es_UY+es_VE+pt+pt_br+pt_BR+pt_PT+gl+lad+an+mwl+it+it_IT+co+nap+scn+vec+sc+ro+la-en/README.md)
11
 
12
- * dataset: opus
 
 
13
  * model: transformer
14
- * pre-processing: normalization + SentencePiece
15
- * download original weights: [opus-2020-04-01.zip](https://object.pouta.csc.fi/OPUS-MT-models/fr+fr_BE+fr_CA+fr_FR+wa+frp+oc+ca+rm+lld+fur+lij+lmo+es+es_AR+es_CL+es_CO+es_CR+es_DO+es_EC+es_ES+es_GT+es_HN+es_MX+es_NI+es_PA+es_PE+es_PR+es_SV+es_UY+es_VE+pt+pt_br+pt_BR+pt_PT+gl+lad+an+mwl+it+it_IT+co+nap+scn+vec+sc+ro+la-en/opus-2020-04-01.zip)
16
- * test set translations: [opus-2020-04-01.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/fr+fr_BE+fr_CA+fr_FR+wa+frp+oc+ca+rm+lld+fur+lij+lmo+es+es_AR+es_CL+es_CO+es_CR+es_DO+es_EC+es_ES+es_GT+es_HN+es_MX+es_NI+es_PA+es_PE+es_PR+es_SV+es_UY+es_VE+pt+pt_br+pt_BR+pt_PT+gl+lad+an+mwl+it+it_IT+co+nap+scn+vec+sc+ro+la-en/opus-2020-04-01.test.txt)
17
- * test set scores: [opus-2020-04-01.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/fr+fr_BE+fr_CA+fr_FR+wa+frp+oc+ca+rm+lld+fur+lij+lmo+es+es_AR+es_CL+es_CO+es_CR+es_DO+es_EC+es_ES+es_GT+es_HN+es_MX+es_NI+es_PA+es_PE+es_PR+es_SV+es_UY+es_VE+pt+pt_br+pt_BR+pt_PT+gl+lad+an+mwl+it+it_IT+co+nap+scn+vec+sc+ro+la-en/opus-2020-04-01.eval.txt)
18
 
19
  ## Benchmarks
20
 
21
  | testset | BLEU | chr-F |
22
  |-----------------------|-------|-------|
23
- | Tatoeba.fr.en | 62.2 | 0.750 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
 
 
1
  ---
2
+ language:
3
+ -it
4
+ -ca
5
+ -rm
6
+ -es
7
+ -ro
8
+ -gl
9
+ -co
10
+ -wa
11
+ -pt
12
+ -oc
13
+ -an
14
+ -id
15
+ -fr
16
+ -ht
17
+ -roa
18
+ -en
19
+
20
  tags:
21
  - translation
22
+
23
+ license: apache-2.0
24
  ---
25
 
26
+ ### roa-eng
27
 
28
+ * source group: Romance languages
29
+ * target group: English
30
+ * OPUS readme: [roa-eng](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/roa-eng/README.md)
31
 
32
+ * model: transformer
33
+ * source language(s): arg ast cat cos egl ext fra frm_Latn gcf_Latn glg hat ind ita lad lad_Latn lij lld_Latn lmo max_Latn mfe min mwl oci pap pms por roh ron scn spa tmw_Latn vec wln zlm_Latn zsm_Latn
34
+ * target language(s): eng
35
  * model: transformer
36
+ * pre-processing: normalization + SentencePiece (spm32k,spm32k)
37
+ * download original weights: [opus2m-2020-08-01.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/roa-eng/opus2m-2020-08-01.zip)
38
+ * test set translations: [opus2m-2020-08-01.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/roa-eng/opus2m-2020-08-01.test.txt)
39
+ * test set scores: [opus2m-2020-08-01.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/roa-eng/opus2m-2020-08-01.eval.txt)
40
 
41
  ## Benchmarks
42
 
43
  | testset | BLEU | chr-F |
44
  |-----------------------|-------|-------|
45
+ | newsdev2016-enro-roneng.ron.eng | 37.1 | 0.631 |
46
+ | newsdiscussdev2015-enfr-fraeng.fra.eng | 31.6 | 0.564 |
47
+ | newsdiscusstest2015-enfr-fraeng.fra.eng | 36.1 | 0.592 |
48
+ | newssyscomb2009-fraeng.fra.eng | 29.3 | 0.563 |
49
+ | newssyscomb2009-itaeng.ita.eng | 33.1 | 0.589 |
50
+ | newssyscomb2009-spaeng.spa.eng | 29.2 | 0.562 |
51
+ | news-test2008-fraeng.fra.eng | 25.2 | 0.533 |
52
+ | news-test2008-spaeng.spa.eng | 26.6 | 0.542 |
53
+ | newstest2009-fraeng.fra.eng | 28.6 | 0.557 |
54
+ | newstest2009-itaeng.ita.eng | 32.0 | 0.580 |
55
+ | newstest2009-spaeng.spa.eng | 28.9 | 0.559 |
56
+ | newstest2010-fraeng.fra.eng | 29.9 | 0.573 |
57
+ | newstest2010-spaeng.spa.eng | 33.3 | 0.596 |
58
+ | newstest2011-fraeng.fra.eng | 31.2 | 0.585 |
59
+ | newstest2011-spaeng.spa.eng | 32.3 | 0.584 |
60
+ | newstest2012-fraeng.fra.eng | 31.3 | 0.580 |
61
+ | newstest2012-spaeng.spa.eng | 35.3 | 0.606 |
62
+ | newstest2013-fraeng.fra.eng | 31.9 | 0.575 |
63
+ | newstest2013-spaeng.spa.eng | 32.8 | 0.592 |
64
+ | newstest2014-fren-fraeng.fra.eng | 34.6 | 0.611 |
65
+ | newstest2016-enro-roneng.ron.eng | 35.8 | 0.614 |
66
+ | Tatoeba-test.arg-eng.arg.eng | 38.7 | 0.512 |
67
+ | Tatoeba-test.ast-eng.ast.eng | 35.2 | 0.520 |
68
+ | Tatoeba-test.cat-eng.cat.eng | 54.9 | 0.703 |
69
+ | Tatoeba-test.cos-eng.cos.eng | 68.1 | 0.666 |
70
+ | Tatoeba-test.egl-eng.egl.eng | 6.7 | 0.209 |
71
+ | Tatoeba-test.ext-eng.ext.eng | 24.2 | 0.427 |
72
+ | Tatoeba-test.fra-eng.fra.eng | 53.9 | 0.691 |
73
+ | Tatoeba-test.frm-eng.frm.eng | 25.7 | 0.423 |
74
+ | Tatoeba-test.gcf-eng.gcf.eng | 14.8 | 0.288 |
75
+ | Tatoeba-test.glg-eng.glg.eng | 54.6 | 0.703 |
76
+ | Tatoeba-test.hat-eng.hat.eng | 37.0 | 0.540 |
77
+ | Tatoeba-test.ita-eng.ita.eng | 64.8 | 0.768 |
78
+ | Tatoeba-test.lad-eng.lad.eng | 21.7 | 0.452 |
79
+ | Tatoeba-test.lij-eng.lij.eng | 11.2 | 0.299 |
80
+ | Tatoeba-test.lld-eng.lld.eng | 10.8 | 0.273 |
81
+ | Tatoeba-test.lmo-eng.lmo.eng | 5.8 | 0.260 |
82
+ | Tatoeba-test.mfe-eng.mfe.eng | 63.1 | 0.819 |
83
+ | Tatoeba-test.msa-eng.msa.eng | 40.9 | 0.592 |
84
+ | Tatoeba-test.multi.eng | 54.9 | 0.697 |
85
+ | Tatoeba-test.mwl-eng.mwl.eng | 44.6 | 0.674 |
86
+ | Tatoeba-test.oci-eng.oci.eng | 20.5 | 0.404 |
87
+ | Tatoeba-test.pap-eng.pap.eng | 56.2 | 0.669 |
88
+ | Tatoeba-test.pms-eng.pms.eng | 10.3 | 0.324 |
89
+ | Tatoeba-test.por-eng.por.eng | 59.7 | 0.738 |
90
+ | Tatoeba-test.roh-eng.roh.eng | 14.8 | 0.378 |
91
+ | Tatoeba-test.ron-eng.ron.eng | 55.2 | 0.703 |
92
+ | Tatoeba-test.scn-eng.scn.eng | 10.2 | 0.259 |
93
+ | Tatoeba-test.spa-eng.spa.eng | 56.2 | 0.714 |
94
+ | Tatoeba-test.vec-eng.vec.eng | 13.8 | 0.317 |
95
+ | Tatoeba-test.wln-eng.wln.eng | 17.3 | 0.323 |
96
+
97
+
98
+ ### System Info:
99
+ - hf_name: roa-eng
100
+
101
+ - source_languages: roa
102
+
103
+ - target_languages: eng
104
+
105
+ - opus_readme_url: https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/roa-eng/README.md
106
+
107
+ - original_repo: Tatoeba-Challenge
108
+
109
+ - tags: ['translation']
110
+
111
+ - languages: ['it', 'ca', 'rm', 'es', 'ro', 'gl', 'co', 'wa', 'pt', 'oc', 'an', 'id', 'fr', 'ht', 'roa', 'en']
112
+
113
+ - src_constituents: {'ita', 'cat', 'roh', 'spa', 'pap', 'lmo', 'mwl', 'lij', 'lad_Latn', 'ext', 'ron', 'ast', 'glg', 'pms', 'zsm_Latn', 'gcf_Latn', 'lld_Latn', 'min', 'tmw_Latn', 'cos', 'wln', 'zlm_Latn', 'por', 'egl', 'oci', 'vec', 'arg', 'ind', 'fra', 'hat', 'lad', 'max_Latn', 'frm_Latn', 'scn', 'mfe'}
114
+
115
+ - tgt_constituents: {'eng'}
116
+
117
+ - src_multilingual: True
118
+
119
+ - tgt_multilingual: False
120
+
121
+ - prepro: normalization + SentencePiece (spm32k,spm32k)
122
+
123
+ - url_model: https://object.pouta.csc.fi/Tatoeba-MT-models/roa-eng/opus2m-2020-08-01.zip
124
+
125
+ - url_test_set: https://object.pouta.csc.fi/Tatoeba-MT-models/roa-eng/opus2m-2020-08-01.test.txt
126
+
127
+ - src_alpha3: roa
128
+
129
+ - tgt_alpha3: eng
130
+
131
+ - short_pair: roa-en
132
+
133
+ - chrF2_score: 0.6970000000000001
134
+
135
+ - bleu: 54.9
136
+
137
+ - brevity_penalty: 0.9790000000000001
138
+
139
+ - ref_len: 74762.0
140
+
141
+ - src_name: Romance languages
142
+
143
+ - tgt_name: English
144
+
145
+ - train_date: 2020-08-01
146
+
147
+ - src_alpha2: roa
148
+
149
+ - tgt_alpha2: en
150
+
151
+ - prefer_old: False
152
+
153
+ - long_pair: roa-eng
154
+
155
+ - helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535
156
+
157
+ - transformers_git_sha: 6bdf998dffa70030e42f512a586f33a15e648edd
158
+
159
+ - port_machine: brutasse
160
 
161
+ - port_time: 2020-08-19-00:09