Itzune v1.9 EN -> EU machine translation argos model
This model was trained using argostrain training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the Opus project.
Model description
- Developed by: argostranslate
- Model type: traslation
- Model version: v1.9
- Source Language: English
- Target Language: Basque
- License: MIT
Training Data
The English-Basque parallel sentences were collected from the following datasets:
Dataset | Sentences before cleaning |
---|---|
CCMatrix v1 | 7,788,871 |
OpenSubtitles v2018 | 805,780 |
XLEnt v1.2 | 800,631 |
GNOME v1 | 652,298 |
HPLT v1.1 | 610,694 |
EhuHac v1 | 585,210 |
WikiMatrix v1 | 119,480 |
KDE4 v2 | 100,160 |
wikimedia v20230407 | 60,990 |
bible-uedin v1 | 15,893 |
Tatoeba v2023-04-12 | 2,070 |
Wiktionary | 629 |
Total | 11,542,706 |
Evaluation results
Below are the evaluation results on the machine translation from English to Basque compared to Google Translate, NLLB 200 3.3B and mt-hitz-en-eu:
BLEU scores
Test set | Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 |
---|---|---|---|---|
Flores 200 devtest | 20.5 | 13.3 | 19.2 | 17.0 |
TaCON | 12.1 | 9.4 | 8.8 | - |
NTREX | 15.7 | 8.0 | 14.5 | - |
Average | 16.1 | 10.2 | 14.2 | - |
TER scores
Test set | Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 |
---|---|---|---|---|
Flores 200 devtest | 59.5 | 70.4 | 65.0 | 70.1 |
TaCON | 69.5 | 75.3 | 76.8 | - |
NTREX | 65.8 | 81.6 | 66.7 | - |
Average | 64.9 | 75.8 | 68.2 | - |