Model Card for Model ID
nllb-200-600M-En-Ar
This model is a fine-tuned version of the NLLB-200-600M model, specifically adapted for translating from English to Egyptian Arabic. Fine-tuned on a custom dataset of 12,000 samples, it aims to provide high-quality translations that capture the nuances and colloquial expressions of Egyptian Arabic.
The dataset used for fine-tuning was collected from high-quality transcriptions of videos, ensuring the language data is rich and contextually accurate.
Model Details
- Base Model: facebook/nllb-200-distilled-600M
- Language Pair: English to Egyptian Arabic
- Dataset: 12,000 custom translation pairs
Usage
To use this model for translation, you can load it with the transformers
library:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "Mhassanen/nllb-200-600M-En-Ar"
tokenizer = AutoTokenizer.from_pretrained(model_name, src_lang="eng_Latn", tgt_lang="arz_Arab")
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
def translate(text):
inputs = tokenizer(text, return_tensors="pt", padding=True)
translated_tokens = model.generate(**inputs)
translated_text = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)
return translated_text
text = "Hello, how are you?"
print(translate(text))
Performance
The model has been evaluated on a validation set to ensure translation quality. While it excels at capturing colloquial Egyptian Arabic, ongoing improvements and additional data can further enhance its performance.
Limitations
- Dataset Size: The custom dataset consists of 12,000 samples, which may limit coverage of diverse expressions and rare terms.
- Colloquial Variations: Egyptian Arabic has many dialectal variations, which might not all be covered equally.
Acknowledgements
This model builds upon the NLLB-200-600M developed by Facebook AI, fine-tuned to cater specifically to the Egyptian Arabic dialect.
Feel free to contribute or provide feedback to help improve this model!
- Downloads last month
- 32