Tevfik-istanbullu's picture
Update README.md
11ecb48 verified
---
license: mit
language:
- ar
metrics:
- accuracy
datasets:
- arbml/CLEANANERCorp
pipeline_tag: token-classification
---
### Arabic Named Entity Recognition (NER) Model
# Overview
This (NER) model specifically designed for the Arabic language. Built from scratch without the use of pretrained models, this model is capable of recognizing entities such as:
company names, names, cities, etc.
The model is trained using TensorFlow and works with a custom dataset split into training, validation, and test sets.
# Model Highlights
- Language: Arabic
- Framework: TensorFlow
- Data Format: Text files (txt format) with train, validation, and test splits
# Entities Recognized:
- ORG: Organizations (e.g., company names)
- LOC: Locations (e.g., cities, countries)
- PERS: Persons (e.g., names, excluding common/popular names)
- MISC: Miscellaneous (e.g., other identifiable private information)
-Intended Use: Arabic text processing, personal data anonymization, data extraction.
# Dataset and Preprocessing
The dataset used in this model is split into three parts:
- Training Set: For model training.
- Validation Set: For tuning model hyperparameters and monitoring overfitting.
- Test Set: For evaluating final model performance.
Each sample in the dataset contains labeled entities for efficient supervised learning.
Data preprocessing steps include tokenization, normalization, and conversion of entities into a suitable format compatible with TensorFlow.
# Model Evaluation
The model achieved a Test Accuracy of # 0.9675 on the test set, indicating strong performance in recognizing and classifying entities in Arabic text.