Binary SDG Detection with ArBERTv2

This model is a binary classifier fine-tuned on the ArBERTv2 architecture, designed to detect mentions of Sustainable Development Goals (SDGs) in Arabic text. The model distinguishes between content related to the United Nations SDGs and non-SDG-related text, enabling the classification of Arabic news articles and other textual data.

Model Details

Intended Use

The model is intended for use in identifying SDG-related content within large collections of Arabic text, such as news articles, reports, or social media. It can be applied to media analysis, policy research, and academic studies focused on tracking SDG coverage in Arabic-speaking regions.

How to Use

from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Kamel/AraSDG_Binary")
model = AutoModelForSequenceClassification.from_pretrained("Kamel/AraSDG_Binary")

# Example text input
text = "your Arabic text here"

# Tokenize input
inputs = tokenizer(text, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

# Convert logits to predicted class (0: non-SDG, 1: SDG)
predicted_class = torch.argmax(logits, dim=-1).item()

# Print the result
if predicted_class == 1:
    print("This text is SDG-related.")
else:
    print("This text is not SDG-related.")

Training Data

The model was fine-tuned on a dataset of Arabic news articles annotated for SDG relevance, augmented with synthetic data generated to balance SDG-related and non-SDG content.

Performance

The model achieves a macro F1-score of 98% on a test dataset, demonstrating high accuracy in distinguishing SDG-related from non-SDG-related content.

Limitations

This model only provides binary classification (SDG vs. non-SDG). It is trained specifically for Modern Standard Arabic (MSA) and may not perform as well on dialectal Arabic.

Kamel
/

AraSDG_Binary