metadata
license: mit
Swahili-English Translation Model
Model Details
Pre-trained Model: Rogendo/sw-en
Fine-tuned On:
Corpus Name: WikiMatrix
- Package: WikiMatrix.en-sw in Moses format
- Website: WikiMatrix
- Release: v1
- Release Date: Wed Nov 4 15:07:29 EET 2020
- License: CC-BY-SA 4.0
- Citation: Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019.
Corpus Name: ParaCrawl
Corpus Name: TICO-19
- Package: tico-19.en-sw in Moses format
- Website: TICO-19
- Release: v2020-10-28
- Release Date: Wed Oct 28 08:44:31 EET 2020
- License: CC0
- Citation: J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).
Model Description
- Developed By: Bildad Otieno
- Model Type: Transformer
- Language(s): Swahili and English
- License: Distributed under the MIT License
- Training Data: The model was fine-tuned using a collection of datasets from OPUS, including WikiMatrix, ParaCrawl, and TICO-19. The datasets provide a diverse range of translation examples between Swahili and English.
Use a pipeline as a high-level helper
from transformers import pipeline
# Initialize the translation pipeline
translator = pipeline("translation", model="Bildad/Swahili-English_Translation")
# Translate text
translation = translator("Habari yako?")[0]
translated_text = translation["translation_text"]
print(translated_text)
Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Bildad/Swahili-English_Translation")
model = AutoModelForSeq2SeqLM.from_pretrained("Bildad/Swahili-English_Translation")
Model Card Authors
Bildad Otieno