Kamel commited on
Commit
331e57f
1 Parent(s): 99311f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -1,11 +1,11 @@
1
- **DBERT** is the first BERT model for the Moroccan Arabic dialect called “Darija”. It is based on the same architecture as BERT-base, but without the Next Sentence Prediction (NSP) objective. This model was trained on a total of ~3 Million sequences of Darija dialect representing 691MB of text or a total of ~100M tokens.
2
 
3
  The model was trained on a dataset issued from three different sources:
4
  * Stories written in Darija scrapped from a dedicated website
5
  * Youtube comments from 40 different Moroccan channels
6
  * Tweets crawled based on a list of Darija keywords.
7
 
8
- More details about DarijaBert are available in the dedicated GitHub repository
9
 
10
  **Loading the model**
11
 
@@ -13,8 +13,8 @@ The model can be loaded directly using the Huggingface library:
13
 
14
  ```python
15
  from transformers import AutoTokenizer, AutoModel
16
- DBERT_tokenizer = AutoTokenizer.from_pretrained("Kamel/DBERT")
17
- DBERT_Bert_model = AutoModel.from_pretrained("Kamel/DBERT")
18
  ```
19
 
20
  **Acknowledgments**
 
1
+ **DarijaBERT** is the first BERT model for the Moroccan Arabic dialect called “Darija”. It is based on the same architecture as BERT-base, but without the Next Sentence Prediction (NSP) objective. This model was trained on a total of ~3 Million sequences of Darija dialect representing 691MB of text or a total of ~100M tokens.
2
 
3
  The model was trained on a dataset issued from three different sources:
4
  * Stories written in Darija scrapped from a dedicated website
5
  * Youtube comments from 40 different Moroccan channels
6
  * Tweets crawled based on a list of Darija keywords.
7
 
8
+ More details about DarijaBert are available in the dedicated GitHub [repository](https://github.com/AIOXLABS/DBert)
9
 
10
  **Loading the model**
11
 
 
13
 
14
  ```python
15
  from transformers import AutoTokenizer, AutoModel
16
+ DBERT_tokenizer = AutoTokenizer.from_pretrained("Kamel/DarijaBERT")
17
+ DBERT_Bert_model = AutoModel.from_pretrained("Kamel/DarijaBERT")
18
  ```
19
 
20
  **Acknowledgments**