PardisSzah commited on
Commit
1466037
1 Parent(s): 6349e8a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md CHANGED
@@ -1,3 +1,69 @@
1
  ---
 
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: fa
3
  license: mit
4
+ pipeline_tag: text2text-generation
5
+
6
  ---
7
+
8
+
9
+ # PersianTextFormalizer
10
+
11
+ This model is fine-tuned to generate formal text from informal text based on the input provided. It has been fine-tuned on [Mohavere Dataset] (Takalli vahideh, Kalantari, Fateme, Shamsfard, Mehrnoush, Developing an Informal-Formal Persian Corpus, 2022.) using the pretrained model [persian-t5-formality-transfer](https://huggingface.co/erfan226/persian-t5-formality-transfer).
12
+
13
+ ## Evaluation Metrics
14
+
15
+ | Metric | Basic Model | Base Persian T5 | Our Model |
16
+ |----------------------|-------------|-----------------|-------------|
17
+ | BLEU-1 | 0.524 | 0.212 | **0.636** |
18
+ | BLEU-2 | 0.358 | 0.137 | **0.511** |
19
+ | BLEU-3 | 0.254 | 0.096 | **0.416** |
20
+ | BLEU-4 | 0.18 | 0.068 | **0.337** |
21
+ | Bert-Score Precision | 0.671 | 0.537 | **0.797** |
22
+ | Bert-Score Recall | 0.712 | 0.570 | **0.805** |
23
+ | Bert-Score F1 Score | 0.690 | 0.549 | **0.800** |
24
+ | ROUGE-1 F1 Score | 0.553 | - | **0.645** |
25
+ | ROUGE-2 F1 Score | 0.274 | - | **0.427** |
26
+ | ROUGE-l F1 Score | 0.522 | - | **0.628** |
27
+
28
+
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+
34
+ from transformers import (T5ForConditionalGeneration, AutoTokenizer, pipeline)
35
+ import torch
36
+
37
+ model = T5ForConditionalGeneration.from_pretrained('parsi-ai-nlpclass/PersianTextFormalizer')
38
+ tokenizer = AutoTokenizer.from_pretrained('parsi-ai-nlpclass/PersianTextFormalizer')
39
+
40
+ pipe = pipeline(task='text2text-generation', model=model, tokenizer=tokenizer)
41
+ def test_model(text):
42
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
43
+ model.to(device)
44
+
45
+ inputs = tokenizer.encode("informal: " + text, return_tensors='pt', max_length=128, truncation=True, padding='max_length')
46
+ inputs = inputs.to(device)
47
+
48
+ outputs = model.generate(inputs, max_length=128, num_beams=4)
49
+ print("Output:", tokenizer.decode(outputs[0], skip_special_tokens=True))
50
+
51
+ text = "به یکی از دوستام میگم که چرا اینکار رو میکنی چرا به فکرت نباید برسه "
52
+ print("Original:", text)
53
+ test_model(text)
54
+
55
+ # output: .به یکی از دوستانم می گویم که چرا اینکار را می کنی چرا به فکرت نباید برسد
56
+
57
+ text = "کجا مخفیش کردی؟"
58
+ print("Original:", text)
59
+ test_model(text)
60
+
61
+ # output: کجا او را پنهان کرده ای؟
62
+
63
+ text = "نمیکشنمون که"
64
+ print("Original:", text)
65
+ test_model(text)
66
+
67
+ # output: .ما را که نمی‌کشند.
68
+
69
+ ```