File size: 1,794 Bytes
cb9355b 58efb1c da54ec9 cb9355b 58efb1c cb9355b 58efb1c cb9355b 58efb1c cb9355b 58efb1c cb9355b 58efb1c cb9355b 58efb1c cb9355b 58efb1c cb9355b 58efb1c da54ec9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
library_name: transformers
tags:
- embeddings
- darija
- arabic
- DarijaBERT
- camelbert
- fine-tuning
datasets:
- HANTIFARAH/combined_darija_dataset_cleaned
language:
- ar
metrics:
- accuracy
base_model:
- SI2M-Lab/DarijaBERT
pipeline_tag: fill-mask
---
# Model Card for Fine-Tuned SI2M_DarijaBERT and CamelBERT
This model card outlines the fine-tuning of **SI2M_DarijaBERT** on a trunc of a large Moroccan Darija dataset scraped from youtube transcriptions and other websites that you can find here : https://huggingface.co/datasets/HANTIFARAH/combined_darija_dataset_cleaned . These transformer model were fine-tuned for the purpose embedding generation in Moroccan Darija, enhancing it performance on specific NLP tasks and tested it Embeddings on text Classification tasks.
## Model Details
### Model Description
The **SI2M_DarijaBERT** model have been fine-tuned on Moroccan Darija texts. the model is based on the BERT architecture and specialize in generating embeddings for text classification tasks in Moroccan Darija.
- **Developed by:** [BAGUENNA Mohammed-Amine]
- **Model type:** Transformer-based (BERT architecture)
- **Language(s) (NLP):** Moroccan Darija (Arabic dialect)
- **Finetuned from model:** SI2M_DarijaBERT
### Recommendations
Users should take care to ensure their data falls within the domain of Moroccan Darija text. Further fine-tuning with more specialized data is recommended for domain-specific applications (e.g., medical language).
## How to Get Started with the Model
You can use the models with the following code:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModel.from_pretrained("bagamine/SI2M_DarijaBERTV1")
tokenizer = AutoTokenizer.from_pretrained("bagamine/SI2M_DarijaBERTV1")
``` |