atlasia
/

Terjman-Large

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

BounharAbdelaziz commited on May 19

Commit

3b3a6bb

•

1 Parent(s): aff104d

Update README.md

Files changed (1) hide show

README.md +51 -18

README.md CHANGED Viewed

@@ -10,30 +10,16 @@ model-index:
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # Terjman-Large
-This model is a fine-tuned version of [Helsinki-NLP/opus-mt-tc-big-en-ar](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-ar) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 3.2078
 - Bleu: 8.3292
 - Gen Len: 34.4959
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -100,4 +86,51 @@ The following hyperparameters were used during training:
 - Transformers 4.40.2
 - Pytorch 2.2.1+cu121
 - Datasets 2.19.1
-- Tokenizers 0.19.1

   results: []
 ---
 # Terjman-Large
+Our model is built upon the powerful Transformer architecture, leveraging state-of-the-art natural language processing techniques.
+It has been finetuned on a the "atlasia/darija_english" dataset enhanced with curated corpora ensuring high-quality and accurate translations.
 It achieves the following results on the evaluation set:
 - Loss: 3.2078
 - Bleu: 8.3292
 - Gen Len: 34.4959
 ### Training hyperparameters
 - Transformers 4.40.2
 - Pytorch 2.2.1+cu121
 - Datasets 2.19.1
+- Tokenizers 0.19.1
+## Usage
+Using our model for translation is simple and straightforward.
+You can integrate it into your projects or workflows via the Hugging Face Transformers library.
+Here's a basic example of how to use the model in Python:
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+# Load the tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained("atlasia/Terjman-Large")
+model = AutoModelForSeq2SeqLM.from_pretrained("atlasia/Terjman-Large")
+# Define your Moroccan Darija Arabizi text
+input_text = "Your english text goes here."
+# Tokenize the input text
+input_tokens = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
+# Perform translation
+output_tokens = model.generate(**input_tokens)
+# Decode the output tokens
+output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
+print("Transliteration:", output_text)
+```
+## Example
+Let's see an example of transliterating Moroccan Darija Arabizi to Arabic:
+**Input**: "Hello my friend, how's life in Morocco"
+**Output**: "مرحبا يا صاحبي, كيفاش الحياة فالمغرب"
+## Limiations
+This version has some limitations mainly due to the Tokenizer.
+We're currently collecting more data with the aim of continous improvements.
+## Feedback
+We're continuously striving to improve our model's performance and usability and we will be improving it incrementaly.
+If you have any feedback, suggestions, or encounter any issues, please don't hesitate to reach out to us.