BounharAbdelaziz commited on
Commit
3b3a6bb
1 Parent(s): aff104d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -18
README.md CHANGED
@@ -10,30 +10,16 @@ model-index:
10
  results: []
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
  # Terjman-Large
17
 
18
- This model is a fine-tuned version of [Helsinki-NLP/opus-mt-tc-big-en-ar](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-ar) on an unknown dataset.
 
 
19
  It achieves the following results on the evaluation set:
20
  - Loss: 3.2078
21
  - Bleu: 8.3292
22
  - Gen Len: 34.4959
23
 
24
- ## Model description
25
-
26
- More information needed
27
-
28
- ## Intended uses & limitations
29
-
30
- More information needed
31
-
32
- ## Training and evaluation data
33
-
34
- More information needed
35
-
36
- ## Training procedure
37
 
38
  ### Training hyperparameters
39
 
@@ -100,4 +86,51 @@ The following hyperparameters were used during training:
100
  - Transformers 4.40.2
101
  - Pytorch 2.2.1+cu121
102
  - Datasets 2.19.1
103
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  results: []
11
  ---
12
 
 
 
 
13
  # Terjman-Large
14
 
15
+ Our model is built upon the powerful Transformer architecture, leveraging state-of-the-art natural language processing techniques.
16
+ It has been finetuned on a the "atlasia/darija_english" dataset enhanced with curated corpora ensuring high-quality and accurate translations.
17
+
18
  It achieves the following results on the evaluation set:
19
  - Loss: 3.2078
20
  - Bleu: 8.3292
21
  - Gen Len: 34.4959
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ### Training hyperparameters
25
 
 
86
  - Transformers 4.40.2
87
  - Pytorch 2.2.1+cu121
88
  - Datasets 2.19.1
89
+ - Tokenizers 0.19.1
90
+
91
+
92
+ ## Usage
93
+
94
+ Using our model for translation is simple and straightforward.
95
+ You can integrate it into your projects or workflows via the Hugging Face Transformers library.
96
+ Here's a basic example of how to use the model in Python:
97
+
98
+ ```python
99
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
100
+
101
+ # Load the tokenizer and model
102
+ tokenizer = AutoTokenizer.from_pretrained("atlasia/Terjman-Large")
103
+ model = AutoModelForSeq2SeqLM.from_pretrained("atlasia/Terjman-Large")
104
+
105
+ # Define your Moroccan Darija Arabizi text
106
+ input_text = "Your english text goes here."
107
+
108
+ # Tokenize the input text
109
+ input_tokens = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
110
+
111
+ # Perform translation
112
+ output_tokens = model.generate(**input_tokens)
113
+
114
+ # Decode the output tokens
115
+ output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
116
+
117
+ print("Transliteration:", output_text)
118
+ ```
119
+
120
+ ## Example
121
+
122
+ Let's see an example of transliterating Moroccan Darija Arabizi to Arabic:
123
+
124
+ **Input**: "Hello my friend, how's life in Morocco"
125
+
126
+ **Output**: "مرحبا يا صاحبي, كيفاش الحياة فالمغرب"
127
+
128
+ ## Limiations
129
+
130
+ This version has some limitations mainly due to the Tokenizer.
131
+ We're currently collecting more data with the aim of continous improvements.
132
+
133
+ ## Feedback
134
+
135
+ We're continuously striving to improve our model's performance and usability and we will be improving it incrementaly.
136
+ If you have any feedback, suggestions, or encounter any issues, please don't hesitate to reach out to us.