urtzai commited on
Commit
78f0d8e
1 Parent(s): 586e035

README file updated

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md CHANGED
@@ -1,3 +1,69 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
+ - eu
6
+ metrics:
7
+ - BLEU
8
+ - TER
9
+ tags:
10
+ - text2text-generation
11
+ - open-nmt
12
+ - pytorch
13
  ---
14
+
15
+ # Itzune v1.9 EN -> EU machine translation argos model
16
+
17
+ This model was trained using [argostrain](https://github.com/argosopentech/argos-train) training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the [Opus project](https://opus.nlpl.eu/).
18
+
19
+ ## Model description
20
+
21
+
22
+ - **Developed by:** argostranslate
23
+ - **Model type:** traslation
24
+ - **Model version:** v1.9
25
+ - **Source Language:** English
26
+ - **Target Language:** Basque
27
+ - **License:** MIT
28
+
29
+ ## Training Data
30
+
31
+ The English-Basque parallel sentences were collected from the following datasets:
32
+
33
+ | Dataset | Sentences before cleaning |
34
+ |----------------------|--------------------------:|
35
+ | CCMatrix v1 | 7,788,871 |
36
+ | OpenSubtitles v2018 | 805,780 |
37
+ | XLEnt v1.2 | 800,631 |
38
+ | GNOME v1 | 652,298 |
39
+ | HPLT v1.1 | 610,694 |
40
+ | EhuHac v1 | 585,210 |
41
+ | WikiMatrix v1 | 119,480 |
42
+ | KDE4 v2 | 100,160 |
43
+ | wikimedia v20230407 | 60,990 |
44
+ | bible-uedin v1 | 15,893 |
45
+ | Tatoeba v2023-04-12 | 2,070 |
46
+ | Wiktionary | 629 |
47
+ | **Total** | **11,542,706** |
48
+
49
+ ### Evaluation results
50
+ Below are the evaluation results on the machine translation from English to Basque compared to [Google Translate](https://translate.google.com/), [NLLB 200 3.3B](https://huggingface.co/facebook/nllb-200-3.3B) and [mt-hitz-en-eu](https://huggingface.co/HiTZ/mt-hitz-en-eu):
51
+
52
+ #### BLEU scores
53
+
54
+ | Test set |Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 |
55
+ |----------------------|-----------------|----------|---------------|------------|
56
+ | Flores 200 devtest | **20.5** | 13.3 | 19.2 | 17.0 |
57
+ | TaCON | **12.1** | 9.4 | 8.8 | - |
58
+ | NTREX | **15.7** | 8.0 | 14.5 | - |
59
+ | Average | **16.1** | 10.2 | 14.2 | - |
60
+
61
+ #### TER scores
62
+
63
+ | Test set |Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 |
64
+ |----------------------|-----------------|----------|---------------|------------|
65
+ | Flores 200 devtest |**59.5** | 70.4 | 65.0 | 70.1 |
66
+ | TaCON |**69.5** | 75.3 | 76.8 | - |
67
+ | NTREX |**65.8** | 81.6 | 66.7 | - |
68
+ | Average |**64.9** | 75.8 | 68.2 | - |
69
+