Edit model card

Model Card for Fine-Tuned SI2M_DarijaBERT and CamelBERT

This model card outlines the fine-tuning of SI2M_DarijaBERT on a trunc of a large Moroccan Darija dataset scraped from youtube transcriptions and other websites that you can find here : https://huggingface.co/datasets/HANTIFARAH/combined_darija_dataset_cleaned . These transformer model were fine-tuned for the purpose embedding generation in Moroccan Darija, enhancing it performance on specific NLP tasks and tested it Embeddings on text Classification tasks.

Model Details

Model Description

The SI2M_DarijaBERT model have been fine-tuned on Moroccan Darija texts. the model is based on the BERT architecture and specialize in generating embeddings for text classification tasks in Moroccan Darija.

  • Developed by: [BAGUENNA Mohammed-Amine]
  • Model type: Transformer-based (BERT architecture)
  • Language(s) (NLP): Moroccan Darija (Arabic dialect)
  • Finetuned from model: SI2M_DarijaBERT

Recommendations

Users should take care to ensure their data falls within the domain of Moroccan Darija text. Further fine-tuning with more specialized data is recommended for domain-specific applications (e.g., medical language).

How to Get Started with the Model

You can use the models with the following code:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModel.from_pretrained("bagamine/SI2M_DarijaBERTV1")
tokenizer = AutoTokenizer.from_pretrained("bagamine/SI2M_DarijaBERTV1")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for bagamine/SI2M_DarijaBERTV1

Finetuned
(6)
this model

Dataset used to train bagamine/SI2M_DarijaBERTV1