File size: 2,738 Bytes
904cb64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87b68a8
904cb64
 
 
 
 
 
 
 
 
1826ef3
 
 
 
 
 
 
 
904cb64
 
 
 
 
 
 
 
87b68a8
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: mit
---
# Swahili-English Translation Model

## Model Details

- **Pre-trained Model**: Rogendo/sw-en
- **Fine-tuned On**: 

  - **Corpus Name**: WikiMatrix
    - **Package**: WikiMatrix.en-sw in Moses format
    - **Website**: [WikiMatrix](http://opus.nlpl.eu/WikiMatrix-v1.php)
    - **Release**: v1
    - **Release Date**: Wed Nov 4 15:07:29 EET 2020
    - **License**: [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/legalcode)
    - **Citation**: Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019.

  - **Corpus Name**: ParaCrawl
    - **Package**: ParaCrawl.en-sw in Moses format
    - **Website**: [ParaCrawl](http://opus.nlpl.eu/ParaCrawl-v9.php)
    - **Release**: v9
    - **Release Date**: Fri Mar 25 12:20:25 EET 2022
    - **License**: [CC0](http://paracrawl.eu/download.html)
    - **Acknowledgement**: Please acknowledge the ParaCrawl project at [ParaCrawl](http://paracrawl.eu) and OPUS for the service.

  - **Corpus Name**: TICO-19
    - **Package**: tico-19.en-sw in Moses format
    - **Website**: [TICO-19](http://opus.nlpl.eu/tico-19-v2020-10-28.php)
    - **Release**: v2020-10-28
    - **Release Date**: Wed Oct 28 08:44:31 EET 2020
    - **License**: [CC0](https://tico-19.github.io/LICENSE.md)
    - **Citation**: J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).

## Model Description

- **Developed By**: Bildad Otieno
- **Model Type**: Transformer
- **Language(s)**: Swahili and English
- **License**: Distributed under the MIT License
- **Training Data**: The model was fine-tuned using a collection of datasets from OPUS, including WikiMatrix, ParaCrawl, and TICO-19. The datasets provide a diverse range of translation examples between Swahili and English.

# Use a pipeline as a high-level helper

        from transformers import pipeline
        
        # Initialize the translation pipeline
        translator = pipeline("translation", model="Bildad/Swahili-English_Translation")
        
        # Translate text
        translation = translator("Habari yako?")[0]
        translated_text = translation["translation_text"]
        
        print(translated_text)

# Load model directly

        from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
        
        tokenizer = AutoTokenizer.from_pretrained("Bildad/Swahili-English_Translation")
        model = AutoModelForSeq2SeqLM.from_pretrained("Bildad/Swahili-English_Translation")

## Model Card Authors 

Bildad Otieno

## Model Card Contact

[email protected]