metadata

license: mit
language:
  - ar
metrics:
  - accuracy
datasets:
  - arbml/CLEANANERCorp
pipeline_tag: token-classification

Arabic Named Entity Recognition (NER) Model

Overview

This (NER) model specifically designed for the Arabic language. Built from scratch without the use of pretrained models, this model is capable of recognizing entities such as: company names, names, cities, etc.

The model is trained using TensorFlow and works with a custom dataset split into training, validation, and test sets.

Model Highlights

Language: Arabic
Framework: TensorFlow
Data Format: Text files (txt format) with train, validation, and test splits

Entities Recognized:

ORG: Organizations (e.g., company names)
LOC: Locations (e.g., cities, countries)
PERS: Persons (e.g., names, excluding common/popular names)
MISC: Miscellaneous (e.g., other identifiable private information)

-Intended Use: Arabic text processing, personal data anonymization, data extraction.

Dataset and Preprocessing

The dataset used in this model is split into three parts:

Training Set: For model training.
Validation Set: For tuning model hyperparameters and monitoring overfitting.
Test Set: For evaluating final model performance. Each sample in the dataset contains labeled entities for efficient supervised learning. Data preprocessing steps include tokenization, normalization, and conversion of entities into a suitable format compatible with TensorFlow.

Model Evaluation

The model achieved a Test Accuracy of # 0.9675 on the test set, indicating strong performance in recognizing and classifying entities in Arabic text.