metadata

library_name: transformers
license: mit
datasets:
  - Helsinki-NLP/opus-100

Developed by: Pirai AI Team
Model type: Sequence-to-Sequence (Seq2Seq) Model
Language(s) (NLP): English and Bahasa Melayu

Inference

import time
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("PiraiAI/fine-tuned-bahasa-melayu-2L")
model = AutoModelForSeq2SeqLM.from_pretrained("PiraiAI/fine-tuned-bahasa-melayu-2L")

Uses

Direct Use

This model can be used directly to translate text from English to Bahasa Melayu and vice versa. It is suitable for applications in translation services, bilingual communication tools, and other language processing tasks.

Downstream Use

When fine-tuned for specific translation tasks or integrated into larger systems, this model can be applied in various multilingual applications, such as automated customer support systems, language learning tools, and content localization.

Out-of-Scope Use

The model is not designed for tasks outside of translation between English and Bahasa Melayu. It may not perform well for other language pairs or for tasks requiring non-translation text processing.

Bias, Risks, and Limitations

Biases The model may reflect biases present in the training data, which could affect translation quality and accuracy, particularly in handling culturally sensitive or nuanced content.

Risks Misinformation: Inaccurate translations could lead to misunderstandings or dissemination of incorrect information. Cultural Sensitivity: The model may struggle with context-specific translations that require cultural understanding.

Recommendations

Users should review translations for accuracy and appropriateness, especially in sensitive contexts. Additional fine-tuning or post-processing might be needed for specialized applications.

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

Training Details

Training Data

The model was fine-tuned on the Helsinki-NLP/opus-100 dataset (en-ms), which consists of 1 million parallel text pairs for English-Bahasa Melayu translations.

Model Card Contact

For questions or additional information, please contact: [email protected]