|
--- |
|
license: mit |
|
language: |
|
- ar |
|
metrics: |
|
- accuracy |
|
datasets: |
|
- arbml/CLEANANERCorp |
|
pipeline_tag: token-classification |
|
--- |
|
### Arabic Named Entity Recognition (NER) Model |
|
|
|
# Overview |
|
|
|
This (NER) model specifically designed for the Arabic language. Built from scratch without the use of pretrained models, this model is capable of recognizing entities such as: |
|
company names, names, cities, etc. |
|
|
|
The model is trained using TensorFlow and works with a custom dataset split into training, validation, and test sets. |
|
|
|
# Model Highlights |
|
- Language: Arabic |
|
- Framework: TensorFlow |
|
- Data Format: Text files (txt format) with train, validation, and test splits |
|
|
|
# Entities Recognized: |
|
- ORG: Organizations (e.g., company names) |
|
- LOC: Locations (e.g., cities, countries) |
|
- PERS: Persons (e.g., names, excluding common/popular names) |
|
- MISC: Miscellaneous (e.g., other identifiable private information) |
|
|
|
-Intended Use: Arabic text processing, personal data anonymization, data extraction. |
|
|
|
# Dataset and Preprocessing |
|
The dataset used in this model is split into three parts: |
|
|
|
- Training Set: For model training. |
|
- Validation Set: For tuning model hyperparameters and monitoring overfitting. |
|
- Test Set: For evaluating final model performance. |
|
Each sample in the dataset contains labeled entities for efficient supervised learning. |
|
Data preprocessing steps include tokenization, normalization, and conversion of entities into a suitable format compatible with TensorFlow. |
|
|
|
|
|
# Model Evaluation |
|
The model achieved a Test Accuracy of # 0.9675 on the test set, indicating strong performance in recognizing and classifying entities in Arabic text. |