Arabic Named Entity Recognition (NER) Model
Overview
This (NER) model specifically designed for the Arabic language. Built from scratch without the use of pretrained models, this model is capable of recognizing entities such as:
company names, names, cities, etc.
The model is trained using TensorFlow and works with a custom dataset split into training, validation, and test sets.
Model Highlights
- Language: Arabic
- Framework: TensorFlow
- Data Format: Text files (txt format) with train, validation, and test splits
Entities Recognized:
- ORG: Organizations (e.g., company names)
- LOC: Locations (e.g., cities, countries)
- PERS: Persons (e.g., names, excluding common/popular names)
- MISC: Miscellaneous (e.g., other identifiable private information)
-Intended Use: Arabic text processing, personal data anonymization, data extraction.
Dataset and Preprocessing
The dataset used in this model is split into three parts:
- Training Set: For model training.
- Validation Set: For tuning model hyperparameters and monitoring overfitting.
- Test Set: For evaluating final model performance.
Each sample in the dataset contains labeled entities for efficient supervised learning.
Data preprocessing steps include tokenization, normalization, and conversion of entities into a suitable format compatible with TensorFlow.
Model Evaluation
The model achieved a Test Accuracy of # 0.9675 on the test set, indicating strong performance in recognizing and classifying entities in Arabic text.