Edit model card

Itzune v1.9 EN -> EU machine translation argos model

This model was trained using argostrain training scripts with 11,542,706 English to Basque parallel strings extracted from datasets obtained directly from the Opus project.

Model description

  • Developed by: argostranslate
  • Model type: traslation
  • Model version: v1.9
  • Source Language: English
  • Target Language: Basque
  • License: MIT

Training Data

The English-Basque parallel sentences were collected from the following datasets:

Dataset Sentences before cleaning
CCMatrix v1 7,788,871
OpenSubtitles v2018 805,780
XLEnt v1.2 800,631
GNOME v1 652,298
HPLT v1.1 610,694
EhuHac v1 585,210
WikiMatrix v1 119,480
KDE4 v2 100,160
wikimedia v20230407 60,990
bible-uedin v1 15,893
Tatoeba v2023-04-12 2,070
Wiktionary 629
Total 11,542,706

Evaluation results

Below are the evaluation results on the machine translation from English to Basque compared to Google Translate, NLLB 200 3.3B and mt-hitz-en-eu:

BLEU scores

Test set Google Translate NLLB 3.3 mt-hitz-en-eu itzune 1.9
Flores 200 devtest 20.5 13.3 19.2 17.0
TaCON 12.1 9.4 8.8 -
NTREX 15.7 8.0 14.5 -
Average 16.1 10.2 14.2 -

TER scores

Test set Google Translate NLLB 3.3 mt-hitz-en-eu itzune 1.9
Flores 200 devtest 59.5 70.4 65.0 70.1
TaCON 69.5 75.3 76.8 -
NTREX 65.8 81.6 66.7 -
Average 64.9 75.8 68.2 -
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .