|
--- |
|
license: mit |
|
--- |
|
# Swahili-English Translation Model |
|
|
|
## Model Details |
|
|
|
- **Pre-trained Model**: Rogendo/sw-en |
|
- **Fine-tuned On**: |
|
|
|
- **Corpus Name**: WikiMatrix |
|
- **Package**: WikiMatrix.en-sw in Moses format |
|
- **Website**: [WikiMatrix](http://opus.nlpl.eu/WikiMatrix-v1.php) |
|
- **Release**: v1 |
|
- **Release Date**: Wed Nov 4 15:07:29 EET 2020 |
|
- **License**: [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/legalcode) |
|
- **Citation**: Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019. |
|
|
|
- **Corpus Name**: ParaCrawl |
|
- **Package**: ParaCrawl.en-sw in Moses format |
|
- **Website**: [ParaCrawl](http://opus.nlpl.eu/ParaCrawl-v9.php) |
|
- **Release**: v9 |
|
- **Release Date**: Fri Mar 25 12:20:25 EET 2022 |
|
- **License**: [CC0](http://paracrawl.eu/download.html) |
|
- **Acknowledgement**: Please acknowledge the ParaCrawl project at [ParaCrawl](http://paracrawl.eu) and OPUS for the service. |
|
|
|
- **Corpus Name**: TICO-19 |
|
- **Package**: tico-19.en-sw in Moses format |
|
- **Website**: [TICO-19](http://opus.nlpl.eu/tico-19-v2020-10-28.php) |
|
- **Release**: v2020-10-28 |
|
- **Release Date**: Wed Oct 28 08:44:31 EET 2020 |
|
- **License**: [CC0](https://tico-19.github.io/LICENSE.md) |
|
- **Citation**: J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012). |
|
|
|
## Model Description |
|
|
|
- **Developed By**: Bildad Otieno |
|
- **Model Type**: Transformer |
|
- **Language(s)**: Swahili and English |
|
- **License**: Distributed under the MIT License |
|
- **Training Data**: The model was fine-tuned using a collection of datasets from OPUS, including WikiMatrix, ParaCrawl, and TICO-19. The datasets provide a diverse range of translation examples between Swahili and English. |
|
|
|
# Use a pipeline as a high-level helper |
|
|
|
from transformers import pipeline |
|
|
|
# Initialize the translation pipeline |
|
translator = pipeline("translation", model="Bildad/Swahili-English_Translation") |
|
|
|
# Translate text |
|
translation = translator("Habari yako?")[0] |
|
translated_text = translation["translation_text"] |
|
|
|
print(translated_text) |
|
|
|
# Load model directly |
|
|
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Bildad/Swahili-English_Translation") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("Bildad/Swahili-English_Translation") |
|
|
|
## Model Card Authors |
|
|
|
Bildad Otieno |
|
|
|
## Model Card Contact |
|
|
|
[email protected] |