mt-general-cy-en

A general language translation model for translating between Welsh and English.

This model was trained using custom DVC pipeline employing Marian NMT, the datasets prepared were generated from the following sources:

The data was split into train, validation and test sets; the test comprising of a random slice of 20% of the total dataset. Segments were selected randomly form of text and TMX from the datasets described above. The datasets were cleaned, without any pre-tokenisation, utilising a SentencePiece vocabulary model, and then fed into a 10 separate Marian NMT training processes, the data having been split into split into 10 training and validation sets.

Evaluation

The BLEU evaluation score was produced using the python library SacreBLEU.

Usage

Ensure you have the prerequisite python libraries installed:

pip install transformers sentencepiece

import trnasformers
model_id = "mgrbyte/mt-general-cy-en"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id)
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(model_id)
translate = transformers.pipeline("translation", model=model, tokenizer=tokenizer)
translated = translate(
   "Mae gan Lywodraeth Cymru targed i gyrraedd miliwn o siariadwyr Cymraeg erbyn y flwyddyn 2020."
)
print(translated["translation_text"])

mgrbyte
/

mt-general-cy-en

mt-general-cy-en

Evaluation

Usage

Evaluation results