File size: 2,045 Bytes
904cb64
 
58198ed
904cb64
58198ed
904cb64
 
 
 
 
58198ed
 
 
 
1369318
 
 
58198ed
1369318
 
58198ed
 
1369318
904cb64
58198ed
 
 
 
 
904cb64
 
58198ed
904cb64
58198ed
904cb64
 
58198ed
904cb64
58198ed
904cb64
 
58198ed
904cb64
58198ed
904cb64
58198ed
904cb64
58198ed
 
87b68a8
58198ed
 
87b68a8
58198ed
 
 
87b68a8
58198ed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
license: mit
library_name: transformers
---

# Swahili-English Translation Model

## Model Details

- **Pre-trained Model**: Rogendo/sw-en
- **Architecture**: Transformer
- **Training Data**: Trained on 210,000 Swahili-English corpus pairs
- **Base Model**: Helsinki-NLP/opus-mt-en-swc
- **Training Method**: Fine-tuned with an emphasis on bidirectional translation between Swahili and English.

### Model Description

This Swahili-English translation model was developed to handle translations between Swahili, one of Africa's most spoken languages, and English. It was trained on a diverse dataset sourced from OPUS, leveraging the Transformer architecture for effective translation.

- **Developed by:** Peter Rogendo, Frederick Kioko
- **Model Type:** Transformer
- **Languages:** Swahili, English
- **License:** Distributed under the MIT License

### Training Data

The model was fine-tuned on the following datasets:
  
  - **WikiMatrix:**
    - **Package**: WikiMatrix.en-sw in Moses format
    - **License**: [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/legalcode)
    - **Citation**: Holger Schwenk et al., WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 2019.

  - **ParaCrawl:**
    - **Package**: ParaCrawl.en-sw in Moses format
    - **License**: [CC0](http://paracrawl.eu/download.html)
    - **Acknowledgement**: Please acknowledge the ParaCrawl project at [ParaCrawl](http://paracrawl.eu).

  - **TICO-19:**
    - **Package**: tico-19.en-sw in Moses format
    - **License**: [CC0](https://tico-19.github.io/LICENSE.md)
    - **Citation**: J. Tiedemann, 2012, Parallel Data, Tools, and Interfaces in OPUS.

## Usage

### Using a Pipeline as a High-Level Helper

```python
from transformers import pipeline

# Initialize the translation pipeline
translator = pipeline("translation", model="Bildad/Swahili-English_Translation")

# Translate text
translation = translator("Habari yako?")[0]
translated_text = translation["translation_text"]

print(translated_text)