--- pipeline_tag: translation language: - multilingual - af - am - ar - en - fr - ha - ig - mg - ny - om - pcm - rn - rw - sn - so - st - sw - xh - yo - zu license: apache-2.0 --- This is a [AfriCOMET-STL (single task)](https://github.com/masakhane-io/africomet) evaluation model: It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference. # Paper [AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages](https://arxiv.org/abs/2311.09828) (Wang et al., arXiv 2023) # License Apache-2.0 # Usage (AfriCOMET) Using this model requires unbabel-comet to be installed: ```bash pip install --upgrade pip # ensures that pip is current pip install unbabel-comet ``` Then you can use it through comet CLI: ```bash comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model masakhane/africomet-stl ``` Or using Python: ```python from comet import download_model, load_from_checkpoint model_path = download_model("masakhane/africomet-stl") model = load_from_checkpoint(model_path) data = [ { "src": "Nadal sàkọọ́lẹ̀ ìforígbárí o ní àmì méje sóódo pẹ̀lú ilẹ̀ Canada.", "mt": "Nadal's head to head record against the Canadian is 7–2.", "ref": "Nadal scored seven unanswered points against Canada." }, { "src": "Laipe yi o padanu si Raoniki ni ere Sisi Brisbeni.", "mt": "He recently lost against Raonic in the Brisbane Open.", "ref": "He recently lost to Raoniki in the game Sisi Brisbeni." } ] model_output = model.predict(data, batch_size=8, gpus=1) print (model_output) ``` # Intended uses Our model is intented to be used for **MT evaluation**. Given a a triplet with (source sentence, translation, reference translation) outputs a single score between 0 and 1 where 1 represents a perfect translation. # Languages Covered: This model builds on top of AfroXLMR which cover the following languages: Afrikaans, Arabic, Amharic, English, French, Hausa, Igbo, Malagasy, Chichewa, Oromo, Nigerian-Pidgin, Kinyarwanda, Kirundi, Shona, Somali, Sesotho, Swahili, isiXhosa, Yoruba, and isiZulu. Thus, results for language pairs containing uncovered languages are unreliable!