PardisSzah
commited on
Commit
•
1466037
1
Parent(s):
6349e8a
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,69 @@
|
|
1 |
---
|
|
|
2 |
license: mit
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: fa
|
3 |
license: mit
|
4 |
+
pipeline_tag: text2text-generation
|
5 |
+
|
6 |
---
|
7 |
+
|
8 |
+
|
9 |
+
# PersianTextFormalizer
|
10 |
+
|
11 |
+
This model is fine-tuned to generate formal text from informal text based on the input provided. It has been fine-tuned on [Mohavere Dataset] (Takalli vahideh, Kalantari, Fateme, Shamsfard, Mehrnoush, Developing an Informal-Formal Persian Corpus, 2022.) using the pretrained model [persian-t5-formality-transfer](https://huggingface.co/erfan226/persian-t5-formality-transfer).
|
12 |
+
|
13 |
+
## Evaluation Metrics
|
14 |
+
|
15 |
+
| Metric | Basic Model | Base Persian T5 | Our Model |
|
16 |
+
|----------------------|-------------|-----------------|-------------|
|
17 |
+
| BLEU-1 | 0.524 | 0.212 | **0.636** |
|
18 |
+
| BLEU-2 | 0.358 | 0.137 | **0.511** |
|
19 |
+
| BLEU-3 | 0.254 | 0.096 | **0.416** |
|
20 |
+
| BLEU-4 | 0.18 | 0.068 | **0.337** |
|
21 |
+
| Bert-Score Precision | 0.671 | 0.537 | **0.797** |
|
22 |
+
| Bert-Score Recall | 0.712 | 0.570 | **0.805** |
|
23 |
+
| Bert-Score F1 Score | 0.690 | 0.549 | **0.800** |
|
24 |
+
| ROUGE-1 F1 Score | 0.553 | - | **0.645** |
|
25 |
+
| ROUGE-2 F1 Score | 0.274 | - | **0.427** |
|
26 |
+
| ROUGE-l F1 Score | 0.522 | - | **0.628** |
|
27 |
+
|
28 |
+
|
29 |
+
|
30 |
+
## Usage
|
31 |
+
|
32 |
+
```python
|
33 |
+
|
34 |
+
from transformers import (T5ForConditionalGeneration, AutoTokenizer, pipeline)
|
35 |
+
import torch
|
36 |
+
|
37 |
+
model = T5ForConditionalGeneration.from_pretrained('parsi-ai-nlpclass/PersianTextFormalizer')
|
38 |
+
tokenizer = AutoTokenizer.from_pretrained('parsi-ai-nlpclass/PersianTextFormalizer')
|
39 |
+
|
40 |
+
pipe = pipeline(task='text2text-generation', model=model, tokenizer=tokenizer)
|
41 |
+
def test_model(text):
|
42 |
+
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
43 |
+
model.to(device)
|
44 |
+
|
45 |
+
inputs = tokenizer.encode("informal: " + text, return_tensors='pt', max_length=128, truncation=True, padding='max_length')
|
46 |
+
inputs = inputs.to(device)
|
47 |
+
|
48 |
+
outputs = model.generate(inputs, max_length=128, num_beams=4)
|
49 |
+
print("Output:", tokenizer.decode(outputs[0], skip_special_tokens=True))
|
50 |
+
|
51 |
+
text = "به یکی از دوستام میگم که چرا اینکار رو میکنی چرا به فکرت نباید برسه "
|
52 |
+
print("Original:", text)
|
53 |
+
test_model(text)
|
54 |
+
|
55 |
+
# output: .به یکی از دوستانم می گویم که چرا اینکار را می کنی چرا به فکرت نباید برسد
|
56 |
+
|
57 |
+
text = "کجا مخفیش کردی؟"
|
58 |
+
print("Original:", text)
|
59 |
+
test_model(text)
|
60 |
+
|
61 |
+
# output: کجا او را پنهان کرده ای؟
|
62 |
+
|
63 |
+
text = "نمیکشنمون که"
|
64 |
+
print("Original:", text)
|
65 |
+
test_model(text)
|
66 |
+
|
67 |
+
# output: .ما را که نمیکشند.
|
68 |
+
|
69 |
+
```
|