ArGTClass is a bloomz based classification model, finetuned to categorize a comprehensive spectrum of fourteen distinct subjects that are Religion, Finance and Economics, Politics, Medical, Cul- ture, Sports, Science and Technology, Anthro- pology and Sociology, Art and Literature, Edu- cation, History, Language and Linguistics, Law, as well as Philosophy in Arabic.

For more details, check out our paper

Finetuning code in the following notebook: Open In Colab

Full classification example (CPU)

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass")

text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"

inputs = tokenizer(text, return_tensors= 'pt')
outputs = model(**inputs)
ind = outputs.logits.argmax(dim=-1)[0]
predicted_class = model.config.id2label[ind.item()]

Full classification example (GPU)

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto')

text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"

inputs = tokenizer(text, return_tensors= 'pt').to("cuda")
outputs = model(**inputs)
ind = outputs.logits.argmax(dim=-1)[0]
predicted_class = model.config.id2label[ind.item()]

Pipeline example (CPU & GPU)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto')

classifier = pipeline("text-classification", model=model, tokenizer= tokenizer)

text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"

classifier(text)
Downloads last month
12
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train dru-ac/ArGTC