Bildad
/

Swahili-English_Translation

Text2Text Generation

Transformers

Safetensors

marian

Inference Endpoints

Model card Files Files and versions Community

Bildad commited on 24 days ago

Commit

58198ed

•

1 Parent(s): c82f820

Update README.md

Browse files

Files changed (1) hide show

README.md +29 -56

README.md CHANGED Viewed

@@ -1,85 +1,58 @@
 ---
 license: mit
 ---
 # Swahili-English Translation Model
 ## Model Details
 - **Pre-trained Model**: Rogendo/sw-en
-- ## Model Details
-- Transformer architecture used
-- Trained on a 210000 corpus pairs
-- Pre-trained Helsinki-NLP/opus-mt-en-swc
-- 2 models to enforce biderectional translation
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
 - **Developed by:** Peter Rogendo, Frederick Kioko
-- **Model type:** Transformer
-- **Language(s) (NLP):** Transformer, Pandas, Numpy
 - **License:** Distributed under the MIT License
-- **Finetuned from model [Helsinki-NLP/opus-mt-en-swc]:** [This pre-trained model was re-trained on a swahili-english sentence pairs that were collected across Kenya. Swahili is the national language and is among the top three of the most spoken language in Africa. The sentences that were used to train this model were 210000 in total.]
-  - **Corpus Name**: WikiMatrix
     - **Package**: WikiMatrix.en-sw in Moses format
-    - **Website**: [WikiMatrix](http://opus.nlpl.eu/WikiMatrix-v1.php)
-    - **Release**: v1
-    - **Release Date**: Wed Nov 4 15:07:29 EET 2020
     - **License**: [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/legalcode)
-    - **Citation**: Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019.
-  - **Corpus Name**: ParaCrawl
     - **Package**: ParaCrawl.en-sw in Moses format
-    - **Website**: [ParaCrawl](http://opus.nlpl.eu/ParaCrawl-v9.php)
-    - **Release**: v9
-    - **Release Date**: Fri Mar 25 12:20:25 EET 2022
     - **License**: [CC0](http://paracrawl.eu/download.html)
-    - **Acknowledgement**: Please acknowledge the ParaCrawl project at [ParaCrawl](http://paracrawl.eu) and OPUS for the service.
-  - **Corpus Name**: TICO-19
     - **Package**: tico-19.en-sw in Moses format
-    - **Website**: [TICO-19](http://opus.nlpl.eu/tico-19-v2020-10-28.php)
-    - **Release**: v2020-10-28
-    - **Release Date**: Wed Oct 28 08:44:31 EET 2020
     - **License**: [CC0](https://tico-19.github.io/LICENSE.md)
-    - **Citation**: J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).
-## Model Description
-- **Developed By**: Bildad Otieno
-- **Model Type**: Transformer
-- **Language(s)**: Swahili and English
-- **License**: Distributed under the MIT License
-- **Training Data**: The model was fine-tuned using a collection of datasets from OPUS, including WikiMatrix, ParaCrawl, and TICO-19. The datasets provide a diverse range of translation examples between Swahili and English.
-# Use a pipeline as a high-level helper
-        from transformers import pipeline
-        # Initialize the translation pipeline
-        translator = pipeline("translation", model="Bildad/Swahili-English_Translation")
-        # Translate text
-        translation = translator("Habari yako?")[0]
-        translated_text = translation["translation_text"]
-        print(translated_text)
-# Load model directly
-        from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-        tokenizer = AutoTokenizer.from_pretrained("Bildad/Swahili-English_Translation")
-        model = AutoModelForSeq2SeqLM.from_pretrained("Bildad/Swahili-English_Translation")
-## Model Card Authors
-Bildad Otieno
-## Model Card Contact
-[email protected]

 ---
 license: mit
+library_name: transformers
 ---
 # Swahili-English Translation Model
 ## Model Details
 - **Pre-trained Model**: Rogendo/sw-en
+- **Architecture**: Transformer
+- **Training Data**: Trained on 210,000 Swahili-English corpus pairs
+- **Base Model**: Helsinki-NLP/opus-mt-en-swc
+- **Training Method**: Fine-tuned with an emphasis on bidirectional translation between Swahili and English.
 ### Model Description
+This Swahili-English translation model was developed to handle translations between Swahili, one of Africa's most spoken languages, and English. It was trained on a diverse dataset sourced from OPUS, leveraging the Transformer architecture for effective translation.
 - **Developed by:** Peter Rogendo, Frederick Kioko
+- **Model Type:** Transformer
+- **Languages:** Swahili, English
 - **License:** Distributed under the MIT License
+### Training Data
+The model was fine-tuned on the following datasets:
+  - **WikiMatrix:**
     - **Package**: WikiMatrix.en-sw in Moses format
     - **License**: [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/legalcode)
+    - **Citation**: Holger Schwenk et al., WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 2019.
+  - **ParaCrawl:**
     - **Package**: ParaCrawl.en-sw in Moses format
     - **License**: [CC0](http://paracrawl.eu/download.html)
+    - **Acknowledgement**: Please acknowledge the ParaCrawl project at [ParaCrawl](http://paracrawl.eu).
+  - **TICO-19:**
     - **Package**: tico-19.en-sw in Moses format
     - **License**: [CC0](https://tico-19.github.io/LICENSE.md)
+    - **Citation**: J. Tiedemann, 2012, Parallel Data, Tools, and Interfaces in OPUS.
+## Usage
+### Using a Pipeline as a High-Level Helper
+```python
+from transformers import pipeline
+# Initialize the translation pipeline
+translator = pipeline("translation", model="Bildad/Swahili-English_Translation")
+# Translate text
+translation = translator("Habari yako?")[0]
+translated_text = translation["translation_text"]
+print(translated_text)