--- license: mit language: - sw - en pipeline_tag: translation tags: - code --- # Swahili-English Translation Model ## Model Details - **Pre-trained Model**: Rogendo/sw-en - **Fine-tuned On**: - **Corpus Name**: WikiMatrix - **Package**: WikiMatrix.en-sw in Moses format - **Website**: [WikiMatrix](http://opus.nlpl.eu/WikiMatrix-v1.php) - **Release**: v1 - **Release Date**: Wed Nov 4 15:07:29 EET 2020 - **License**: [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/legalcode) - **Citation**: Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019. - **Corpus Name**: ParaCrawl - **Package**: ParaCrawl.en-sw in Moses format - **Website**: [ParaCrawl](http://opus.nlpl.eu/ParaCrawl-v9.php) - **Release**: v9 - **Release Date**: Fri Mar 25 12:20:25 EET 2022 - **License**: [CC0](http://paracrawl.eu/download.html) - **Acknowledgement**: Please acknowledge the ParaCrawl project at [ParaCrawl](http://paracrawl.eu) and OPUS for the service. - **Corpus Name**: TICO-19 - **Package**: tico-19.en-sw in Moses format - **Website**: [TICO-19](http://opus.nlpl.eu/tico-19-v2020-10-28.php) - **Release**: v2020-10-28 - **Release Date**: Wed Oct 28 08:44:31 EET 2020 - **License**: [CC0](https://tico-19.github.io/LICENSE.md) - **Citation**: J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012). ## Model Description - **Developed By**: Bildad Otieno - **Model Type**: Transformer - **Language(s)**: Swahili and English - **License**: Distributed under the MIT License - **Training Data**: The model was fine-tuned using a collection of datasets from OPUS, including WikiMatrix, ParaCrawl, and TICO-19. The datasets provide a diverse range of translation examples between Swahili and English. # Use a pipeline as a high-level helper from transformers import pipeline # Initialize the translation pipeline translator = pipeline("translation", model="Bildad/Swahili-English_Translation") # Translate text translation = translator("Habari yako?")[0] translated_text = translation["translation_text"] print(translated_text) # Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Bildad/Swahili-English_Translation") model = AutoModelForSeq2SeqLM.from_pretrained("Bildad/Swahili-English_Translation") ## Model Card Authors Bildad Otieno ## Model Card Contact bildadmoses8@gmail.com