parsi-ai-nlpclass
/

PersianTextFormalizer

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

PardisSzah commited on Mar 3

Commit

1466037

•

1 Parent(s): 6349e8a

Update README.md

Files changed (1) hide show

README.md +66 -0

README.md CHANGED Viewed

@@ -1,3 +1,69 @@
 ---
 license: mit
 ---

 ---
+language: fa
 license: mit
+pipeline_tag: text2text-generation
 ---
+# PersianTextFormalizer
+This model is fine-tuned to generate formal text from informal text based on the input provided. It has been fine-tuned on [Mohavere Dataset] (Takalli vahideh, Kalantari, Fateme, Shamsfard, Mehrnoush, Developing an Informal-Formal Persian Corpus, 2022.) using the pretrained model [persian-t5-formality-transfer](https://huggingface.co/erfan226/persian-t5-formality-transfer).
+## Evaluation Metrics
+| Metric               | Basic Model | Base Persian T5 | Our Model   |
+|----------------------|-------------|-----------------|-------------|
+| BLEU-1               | 0.524       | 0.212           | **0.636**   |
+| BLEU-2               | 0.358       | 0.137           | **0.511**   |
+| BLEU-3               | 0.254       | 0.096           | **0.416**   |
+| BLEU-4               | 0.18        | 0.068           | **0.337**   |
+| Bert-Score Precision | 0.671       | 0.537           | **0.797**   |
+| Bert-Score Recall    | 0.712       | 0.570           | **0.805**   |
+| Bert-Score F1 Score  | 0.690       | 0.549           | **0.800**   |
+| ROUGE-1 F1 Score     | 0.553       | -               | **0.645**   |
+| ROUGE-2 F1 Score     | 0.274       | -               | **0.427**   |
+| ROUGE-l F1 Score     | 0.522       | -               | **0.628**   |
+## Usage
+```python
+from transformers import (T5ForConditionalGeneration, AutoTokenizer, pipeline)
+import torch
+model = T5ForConditionalGeneration.from_pretrained('parsi-ai-nlpclass/PersianTextFormalizer')
+tokenizer = AutoTokenizer.from_pretrained('parsi-ai-nlpclass/PersianTextFormalizer')
+pipe = pipeline(task='text2text-generation', model=model, tokenizer=tokenizer)
+def test_model(text):
+  device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+  model.to(device)
+  inputs = tokenizer.encode("informal: " + text, return_tensors='pt', max_length=128, truncation=True, padding='max_length')
+  inputs = inputs.to(device)
+  outputs = model.generate(inputs, max_length=128, num_beams=4)
+  print("Output:", tokenizer.decode(outputs[0], skip_special_tokens=True))
+text = "به یکی از دوستام میگم که چرا اینکار رو میکنی چرا به فکرت نباید برسه "
+print("Original:", text)
+test_model(text)
+# output:  .به یکی از دوستانم می گویم که چرا اینکار را می کنی چرا به فکرت نباید برسد
+text = "کجا مخفیش کردی؟"
+print("Original:", text)
+test_model(text)
+# output:   کجا او را پنهان کرده ای؟
+text = "نمیکشنمون که"
+print("Original:", text)
+test_model(text)
+# output: .ما را که نمی‌کشند.
+```