SI2M-Lab
/

DarijaBERT

Moroccan Arabic

Inference Endpoints

Model card Files Files and versions Community

Kamel commited on Oct 31, 2021

Commit

331e57f

•

1 Parent(s): 99311f6

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -1,11 +1,11 @@
-**DBERT** is the first BERT model for the Moroccan Arabic dialect called “Darija”. It is based on the same architecture as BERT-base, but without the Next Sentence Prediction (NSP) objective. This model was trained on a total of ~3 Million sequences of Darija dialect representing 691MB of text or a total of ~100M tokens.
 The model was trained on a dataset issued from three different sources:
 *  Stories written in Darija scrapped from a dedicated website
 *  Youtube comments from 40 different Moroccan channels
 *  Tweets crawled based on a list of Darija keywords.
-More details about DarijaBert are available in the dedicated GitHub repository
 **Loading the model**
@@ -13,8 +13,8 @@ The model can be loaded directly using the Huggingface library:
 ```python
 from transformers import AutoTokenizer, AutoModel
-DBERT_tokenizer = AutoTokenizer.from_pretrained("Kamel/DBERT")
-DBERT_Bert_model = AutoModel.from_pretrained("Kamel/DBERT")
 ```
 **Acknowledgments**

+**DarijaBERT** is the first BERT model for the Moroccan Arabic dialect called “Darija”. It is based on the same architecture as BERT-base, but without the Next Sentence Prediction (NSP) objective. This model was trained on a total of ~3 Million sequences of Darija dialect representing 691MB of text or a total of ~100M tokens.
 The model was trained on a dataset issued from three different sources:
 *  Stories written in Darija scrapped from a dedicated website
 *  Youtube comments from 40 different Moroccan channels
 *  Tweets crawled based on a list of Darija keywords.
+More details about DarijaBert are available in the dedicated GitHub [repository](https://github.com/AIOXLABS/DBert)
 **Loading the model**
 ```python
 from transformers import AutoTokenizer, AutoModel
+DBERT_tokenizer = AutoTokenizer.from_pretrained("Kamel/DarijaBERT")
+DBERT_Bert_model = AutoModel.from_pretrained("Kamel/DarijaBERT")
 ```
 **Acknowledgments**