Bildad's picture
Update README
17a8b19 verified
|
raw
history blame
2.74 kB
metadata
license: mit

Swahili-English Translation Model

Model Details

  • Pre-trained Model: Rogendo/sw-en

  • Fine-tuned On:

    • Corpus Name: WikiMatrix

      • Package: WikiMatrix.en-sw in Moses format
      • Website: WikiMatrix
      • Release: v1
      • Release Date: Wed Nov 4 15:07:29 EET 2020
      • License: CC-BY-SA 4.0
      • Citation: Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong and Paco Guzman, WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia, arXiv, July 11 2019.
    • Corpus Name: ParaCrawl

      • Package: ParaCrawl.en-sw in Moses format
      • Website: ParaCrawl
      • Release: v9
      • Release Date: Fri Mar 25 12:20:25 EET 2022
      • License: CC0
      • Acknowledgement: Please acknowledge the ParaCrawl project at ParaCrawl and OPUS for the service.
    • Corpus Name: TICO-19

      • Package: tico-19.en-sw in Moses format
      • Website: TICO-19
      • Release: v2020-10-28
      • Release Date: Wed Oct 28 08:44:31 EET 2020
      • License: CC0
      • Citation: J. Tiedemann, 2012, Parallel Data, Tools and Interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012).

Model Description

  • Developed By: Bildad Otieno
  • Model Type: Transformer
  • Language(s): Swahili and English
  • License: Distributed under the MIT License
  • Training Data: The model was fine-tuned using a collection of datasets from OPUS, including WikiMatrix, ParaCrawl, and TICO-19. The datasets provide a diverse range of translation examples between Swahili and English.

Use a pipeline as a high-level helper

    from transformers import pipeline
    
    # Initialize the translation pipeline
    translator = pipeline("translation", model="Bildad/Swahili-English_Translation")
    
    # Translate text
    translation = translator("Habari yako?")[0]
    translated_text = translation["translation_text"]
    
    print(translated_text)

Load model directly

    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    
    tokenizer = AutoTokenizer.from_pretrained("Bildad/Swahili-English_Translation")
    model = AutoModelForSeq2SeqLM.from_pretrained("Bildad/Swahili-English_Translation")

Model Card Authors

Bildad Otieno

Model Card Contact

[email protected]