Personal Identifiable Information (PII Model)
This model is a fine-tuned version of bert-base-cased on the generator dataset. It achieves the following results:
- Training Loss: 0.003900
- Validation Loss: 0.051071
- Precision: 95.53%
- Recall: 96.60%
- F1: 96%
- Accuracy:99.11%
Model description
Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore.
Model can Detect Following Entity Group
- ACCOUNTNUMBER
- FIRSTNAME
- ACCOUNTNAME
- PHONENUMBER
- CREDITCARDCVV
- CREDITCARDISSUER
- PREFIX
- LASTNAME
- AMOUNT
- DATE
- DOB
- COMPANYNAME
- BUILDINGNUMBER
- STREET
- SECONDARYADDRESS
- STATE
- CITY
- CREDITCARDNUMBER
- SSN
- URL
- USERNAME
- PASSWORD
- COUNTY
- PIN
- MIDDLENAME
- IBAN
- GENDER
- AGE
- ZIPCODE
- SEX
Training hyperparameters
The following hyperparameters were used during training:
Hyperparameter | Value |
---|---|
Learning Rate | 5e-5 |
Train Batch Size | 16 |
Eval Batch Size | 16 |
Number of Training Epochs | 7 |
Weight Decay | 0.01 |
Save Strategy | Epoch |
Load Best Model at End | True |
Metric for Best Model | F1 |
Push to Hub | True |
Evaluation Strategy | Epoch |
Early Stopping Patience | 3 |
Training results
Epoch | Training Loss | Validation Loss | Precision (%) | Recall (%) | F1 Score (%) | Accuracy (%) |
---|---|---|---|---|---|---|
1 | 0.0443 | 0.038108 | 91.88 | 95.17 | 93.50 | 98.80 |
2 | 0.0318 | 0.035728 | 94.13 | 96.15 | 95.13 | 98.90 |
3 | 0.0209 | 0.032016 | 94.81 | 96.42 | 95.61 | 99.01 |
4 | 0.0154 | 0.040221 | 93.87 | 95.80 | 94.82 | 98.88 |
5 | 0.0084 | 0.048183 | 94.21 | 96.06 | 95.13 | 98.93 |
6 | 0.0037 | 0.052281 | 94.49 | 96.60 | 95.53 | 99.07 |
Author
Framework versions
- Transformers 4.38.2
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 1,793
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ab-ai/pii_model
Base model
google-bert/bert-base-casedEvaluation results
- Precision on generatorself-reported0.955
- Recall on generatorself-reported0.965
- F1 on generatorself-reported0.960
- Accuracy on generatorself-reported0.991