File size: 2,738 Bytes
904cb64 87b68a8 904cb64 1826ef3 904cb64 87b68a8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
---
license: mit
---
# Swahili-English Translation Model
## Model Details
- **Pre-trained Model**: Rogendo/sw-en
- **Fine-tuned On**:
- **Corpus Name**: WikiMatrix
- **Package**: WikiMatrix.en-sw in Moses format
- **Website**: [WikiMatrix](http://opus.nlpl.eu/WikiMatrix-v1.php)
- **Release**: v1
- **Release Date**: Wed Nov 4 15:07:29 EET 2020
- **License**: [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/legalcode)
- **Citation**: Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019.
- **Corpus Name**: ParaCrawl
- **Package**: ParaCrawl.en-sw in Moses format
- **Website**: [ParaCrawl](http://opus.nlpl.eu/ParaCrawl-v9.php)
- **Release**: v9
- **Release Date**: Fri Mar 25 12:20:25 EET 2022
- **License**: [CC0](http://paracrawl.eu/download.html)
- **Acknowledgement**: Please acknowledge the ParaCrawl project at [ParaCrawl](http://paracrawl.eu) and OPUS for the service.
- **Corpus Name**: TICO-19
- **Package**: tico-19.en-sw in Moses format
- **Website**: [TICO-19](http://opus.nlpl.eu/tico-19-v2020-10-28.php)
- **Release**: v2020-10-28
- **Release Date**: Wed Oct 28 08:44:31 EET 2020
- **License**: [CC0](https://tico-19.github.io/LICENSE.md)
- **Citation**: J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).
## Model Description
- **Developed By**: Bildad Otieno
- **Model Type**: Transformer
- **Language(s)**: Swahili and English
- **License**: Distributed under the MIT License
- **Training Data**: The model was fine-tuned using a collection of datasets from OPUS, including WikiMatrix, ParaCrawl, and TICO-19. The datasets provide a diverse range of translation examples between Swahili and English.
# Use a pipeline as a high-level helper
from transformers import pipeline
# Initialize the translation pipeline
translator = pipeline("translation", model="Bildad/Swahili-English_Translation")
# Translate text
translation = translator("Habari yako?")[0]
translated_text = translation["translation_text"]
print(translated_text)
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Bildad/Swahili-English_Translation")
model = AutoModelForSeq2SeqLM.from_pretrained("Bildad/Swahili-English_Translation")
## Model Card Authors
Bildad Otieno
## Model Card Contact
[email protected] |