File size: 1,775 Bytes
f0ac833
3110525
b5478bf
3110525
 
 
 
 
 
b5478bf
 
 
3110525
 
 
f0ac833
434d2bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3110525
 
 
434d2bb
 
 
f841d28
b4a83b7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
license: apache-2.0
language: fa
widget:
 - text: "از هر دستی بگیری از همون [MASK] میدی"
 - text: "این آخرین باره بهت [MASK] میگم"
 - text: 'چرا آن جوان بیچاره را به سخره [MASK]'
 - text: 'آخه محسن [MASK] هم شد خواننده؟'
 - text: 'پسر عجب [MASK] زد'
tags:
- bert-fa
- bert-persian
model-index:
- name: dal-bert
  results: []
---


DAL-BERT: Another pre-trained language model for Persian
---

DAL-BERT is a transformer-based model trained on more than 80 gigabytes of Persian text including both formal and informal (conversational) contexts. The architecture of this model follows the original BERT [[Devlin et al.](https://arxiv.org/abs/1810.04805)].

How to use the Model
---
```python
from transformers import BertForMaskedLM, BertTokenizer, pipeline
model = BertForMaskedLM.from_pretrained('sharif-dal/dal-bert')
tokenizer = BertTokenizer.from_pretrained('sharif-dal/dal-bert')
fill_sentence = pipeline('fill-mask', model=model, tokenizer=tokenizer)
fill_sentence('اینجا جمله مورد نظر خود را بنویسید و کلمه موردنظر را [MASK] کنید')
```

The Training Data
---
The abovementioned model was trained on a bunch of newspapers, news agencies' websites, technology-related sources, people's comments, magazines, literary criticism, and some blogs.

Evaluation
---

| Training Loss | Epoch | Step  |
|:-------------:|:-----:|:-----:|
| 2.1855        | 13   | 7649486 |

Contributors
---
- Arman Malekzadeh [[Github](https://github.com/arm-on)]
- Amirhossein Ramazani, Master's Student in AI @ Sharif University of Technology [[Linkedin](https://www.linkedin.com/in/amirhossein-ramazani/)] [[Github](https://github.com/amirhossein1376)]