library_name: transformers
license: mit
datasets:
- Helsinki-NLP/opus-100
- Developed by: Pirai AI Team
- Model type: Sequence-to-Sequence (Seq2Seq) Model
- Language(s) (NLP): English and Bahasa Melayu
Inference
import time
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("PiraiAI/fine-tuned-bahasa-melayu-2L")
model = AutoModelForSeq2SeqLM.from_pretrained("PiraiAI/fine-tuned-bahasa-melayu-2L")
Uses
Direct Use
This model can be used directly to translate text from English to Bahasa Melayu and vice versa. It is suitable for applications in translation services, bilingual communication tools, and other language processing tasks.
Downstream Use
When fine-tuned for specific translation tasks or integrated into larger systems, this model can be applied in various multilingual applications, such as automated customer support systems, language learning tools, and content localization.
Out-of-Scope Use
The model is not designed for tasks outside of translation between English and Bahasa Melayu. It may not perform well for other language pairs or for tasks requiring non-translation text processing.
Bias, Risks, and Limitations
Biases The model may reflect biases present in the training data, which could affect translation quality and accuracy, particularly in handling culturally sensitive or nuanced content.
Risks Misinformation: Inaccurate translations could lead to misunderstandings or dissemination of incorrect information. Cultural Sensitivity: The model may struggle with context-specific translations that require cultural understanding.
Recommendations
Users should review translations for accuracy and appropriateness, especially in sensitive contexts. Additional fine-tuning or post-processing might be needed for specialized applications.
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
Training Details
Training Data
The model was fine-tuned on the Helsinki-NLP/opus-100 dataset (en-ms), which consists of 1 million parallel text pairs for English-Bahasa Melayu translations.
Model Card Contact
For questions or additional information, please contact: [email protected]