Bildad's picture
Update README.md
58198ed verified
metadata
license: mit
library_name: transformers

Swahili-English Translation Model

Model Details

  • Pre-trained Model: Rogendo/sw-en
  • Architecture: Transformer
  • Training Data: Trained on 210,000 Swahili-English corpus pairs
  • Base Model: Helsinki-NLP/opus-mt-en-swc
  • Training Method: Fine-tuned with an emphasis on bidirectional translation between Swahili and English.

Model Description

This Swahili-English translation model was developed to handle translations between Swahili, one of Africa's most spoken languages, and English. It was trained on a diverse dataset sourced from OPUS, leveraging the Transformer architecture for effective translation.

  • Developed by: Peter Rogendo, Frederick Kioko
  • Model Type: Transformer
  • Languages: Swahili, English
  • License: Distributed under the MIT License

Training Data

The model was fine-tuned on the following datasets:

  • WikiMatrix:

    • Package: WikiMatrix.en-sw in Moses format
    • License: CC-BY-SA 4.0
    • Citation: Holger Schwenk et al., WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 2019.
  • ParaCrawl:

    • Package: ParaCrawl.en-sw in Moses format
    • License: CC0
    • Acknowledgement: Please acknowledge the ParaCrawl project at ParaCrawl.
  • TICO-19:

    • Package: tico-19.en-sw in Moses format
    • License: CC0
    • Citation: J. Tiedemann, 2012, Parallel Data, Tools, and Interfaces in OPUS.

Usage

Using a Pipeline as a High-Level Helper

from transformers import pipeline

# Initialize the translation pipeline
translator = pipeline("translation", model="Bildad/Swahili-English_Translation")

# Translate text
translation = translator("Habari yako?")[0]
translated_text = translation["translation_text"]

print(translated_text)