--- license: mit language: - en - eu metrics: - BLEU - TER tags: - text2text-generation - open-nmt - pytorch --- # Itzune v1.9 EN -> EU machine translation argos model This model was trained using [argostrain](https://github.com/argosopentech/argos-train) training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the [Opus project](https://opus.nlpl.eu/). ## Model description - **Developed by:** argostranslate - **Model type:** traslation - **Model version:** v1.9 - **Source Language:** English - **Target Language:** Basque - **License:** MIT ## Training Data The English-Basque parallel sentences were collected from the following datasets: | Dataset | Sentences before cleaning | |----------------------|--------------------------:| | CCMatrix v1 | 7,788,871 | | OpenSubtitles v2018 | 805,780 | | XLEnt v1.2 | 800,631 | | GNOME v1 | 652,298 | | HPLT v1.1 | 610,694 | | EhuHac v1 | 585,210 | | WikiMatrix v1 | 119,480 | | KDE4 v2 | 100,160 | | wikimedia v20230407 | 60,990 | | bible-uedin v1 | 15,893 | | Tatoeba v2023-04-12 | 2,070 | | Wiktionary | 629 | | **Total** | **11,542,706** | ### Evaluation results Below are the evaluation results on the machine translation from English to Basque compared to [Google Translate](https://translate.google.com/), [NLLB 200 3.3B](https://huggingface.co/facebook/nllb-200-3.3B) and [mt-hitz-en-eu](https://huggingface.co/HiTZ/mt-hitz-en-eu): #### BLEU scores | Test set |Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 | |----------------------|-----------------|----------|---------------|------------| | Flores 200 devtest | **20.5** | 13.3 | 19.2 | 17.0 | | TaCON | **12.1** | 9.4 | 8.8 | - | | NTREX | **15.7** | 8.0 | 14.5 | - | | Average | **16.1** | 10.2 | 14.2 | - | #### TER scores | Test set |Google Translate | NLLB 3.3 | mt-hitz-en-eu | itzune 1.9 | |----------------------|-----------------|----------|---------------|------------| | Flores 200 devtest |**59.5** | 70.4 | 65.0 | 70.1 | | TaCON |**69.5** | 75.3 | 76.8 | - | | NTREX |**65.8** | 81.6 | 66.7 | - | | Average |**64.9** | 75.8 | 68.2 | - |