|
--- |
|
license: cc-by-nc-4.0 |
|
base_model: microsoft/deberta-v3-base |
|
tags: |
|
- generated_from_trainer |
|
model-index: |
|
- name: deberta-v3-base_finetuned_ai4privacy_v2 |
|
results: [] |
|
datasets: |
|
- ai4privacy/pii-masking-200k |
|
- Isotonic/pii-masking-200k |
|
language: |
|
- en |
|
metrics: |
|
- seqeval |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
🌟 Buying me coffee is a direct way to show support for this project. |
|
<a href="https://www.buymeacoffee.com/isotonic"><img src="https://www.buymeacoffee.com/assets/img/guidelines/download-assets-sm-1.svg" alt=""></a> |
|
|
|
# deberta-v3-base_finetuned_ai4privacy_v2 |
|
|
|
This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on the [ai4privacy/pii-masking-200k](https://huggingface.co/ai4privacy/pii-masking-200k) dataset. |
|
|
|
## Useage |
|
GitHub Implementation: [Ai4Privacy](https://github.com/Sripaad/ai4privacy) |
|
|
|
## Model description |
|
|
|
This model has been finetuned on the World's largest open source privacy dataset. |
|
|
|
The purpose of the trained models is to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs. |
|
|
|
The example texts have 54 PII classes (types of sensitive data), targeting 229 discussion subjects / use cases split across business, education, psychology and legal fields, and 5 interactions styles (e.g. casual conversation, formal document, emails etc...). |
|
|
|
Take a look at the Github implementation for specific reasearch. |
|
|
|
## Intended uses & limitations |
|
|
|
More information needed |
|
|
|
## Training and evaluation data |
|
|
|
More information needed |
|
|
|
## Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 6e-04 |
|
- train_batch_size: 32 |
|
- eval_batch_size: 32 |
|
- seed: 412 |
|
- optimizer: Adam with betas=(0.96,0.996) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine_with_restarts |
|
- lr_scheduler_warmup_ratio: 0.22 |
|
- num_epochs: 7 |
|
- mixed_precision_training: N/A |
|
|
|
## Class wise metrics |
|
It achieves the following results on the evaluation set: |
|
|
|
- Loss: 0.0211 |
|
- Overall Precision: 0.9722 |
|
- Overall Recall: 0.9792 |
|
- Overall F1: 0.9757 |
|
- Overall Accuracy: 0.9915 |
|
|
|
- Accountname F1: 0.9993 |
|
- Accountnumber F1: 0.9986 |
|
- Age F1: 0.9884 |
|
- Amount F1: 0.9984 |
|
- Bic F1: 0.9942 |
|
- Bitcoinaddress F1: 0.9974 |
|
- Buildingnumber F1: 0.9898 |
|
- City F1: 1.0 |
|
- Companyname F1: 1.0 |
|
- County F1: 0.9976 |
|
- Creditcardcvv F1: 0.9541 |
|
- Creditcardissuer F1: 0.9970 |
|
- Creditcardnumber F1: 0.9754 |
|
- Currency F1: 0.8966 |
|
- Currencycode F1: 0.9946 |
|
- Currencyname F1: 0.7697 |
|
- Currencysymbol F1: 0.9958 |
|
- Date F1: 0.9778 |
|
- Dob F1: 0.9546 |
|
- Email F1: 1.0 |
|
- Ethereumaddress F1: 1.0 |
|
- Eyecolor F1: 0.9925 |
|
- Firstname F1: 0.9947 |
|
- Gender F1: 1.0 |
|
- Height F1: 1.0 |
|
- Iban F1: 0.9978 |
|
- Ip F1: 0.5404 |
|
- Ipv4 F1: 0.8455 |
|
- Ipv6 F1: 0.8855 |
|
- Jobarea F1: 0.9091 |
|
- Jobtitle F1: 1.0 |
|
- Jobtype F1: 0.9672 |
|
- Lastname F1: 0.9855 |
|
- Litecoinaddress F1: 0.9949 |
|
- Mac F1: 0.9965 |
|
- Maskednumber F1: 0.9836 |
|
- Middlename F1: 0.7385 |
|
- Nearbygpscoordinate F1: 1.0 |
|
- Ordinaldirection F1: 1.0 |
|
- Password F1: 1.0 |
|
- Phoneimei F1: 0.9978 |
|
- Phonenumber F1: 0.9975 |
|
- Pin F1: 0.9820 |
|
- Prefix F1: 0.9872 |
|
- Secondaryaddress F1: 1.0 |
|
- Sex F1: 0.9916 |
|
- Ssn F1: 0.9960 |
|
- State F1: 0.9967 |
|
- Street F1: 0.9991 |
|
- Time F1: 1.0 |
|
- Url F1: 1.0 |
|
- Useragent F1: 0.9981 |
|
- Username F1: 1.0 |
|
- Vehiclevin F1: 0.9950 |
|
- Vehiclevrm F1: 0.9870 |
|
- Zipcode F1: 0.9966 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | Accountname F1 | Accountnumber F1 | Age F1 | Amount F1 | Bic F1 | Bitcoinaddress F1 | Buildingnumber F1 | City F1 | Companyname F1 | County F1 | Creditcardcvv F1 | Creditcardissuer F1 | Creditcardnumber F1 | Currency F1 | Currencycode F1 | Currencyname F1 | Currencysymbol F1 | Date F1 | Dob F1 | Email F1 | Ethereumaddress F1 | Eyecolor F1 | Firstname F1 | Gender F1 | Height F1 | Iban F1 | Ip F1 | Ipv4 F1 | Ipv6 F1 | Jobarea F1 | Jobtitle F1 | Jobtype F1 | Lastname F1 | Litecoinaddress F1 | Mac F1 | Maskednumber F1 | Middlename F1 | Nearbygpscoordinate F1 | Ordinaldirection F1 | Password F1 | Phoneimei F1 | Phonenumber F1 | Pin F1 | Prefix F1 | Secondaryaddress F1 | Sex F1 | Ssn F1 | State F1 | Street F1 | Time F1 | Url F1 | Useragent F1 | Username F1 | Vehiclevin F1 | Vehiclevrm F1 | Zipcode F1 | |
|
|:-------------:|:-----:|:-----:|:---------------:|:-----------------:|:--------------:|:----------:|:----------------:|:--------------:|:----------------:|:------:|:---------:|:------:|:-----------------:|:-----------------:|:-------:|:--------------:|:---------:|:----------------:|:-------------------:|:-------------------:|:-----------:|:---------------:|:---------------:|:-----------------:|:-------:|:------:|:--------:|:------------------:|:-----------:|:------------:|:---------:|:---------:|:-------:|:------:|:-------:|:-------:|:----------:|:-----------:|:----------:|:-----------:|:------------------:|:------:|:---------------:|:-------------:|:----------------------:|:-------------------:|:-----------:|:------------:|:--------------:|:------:|:---------:|:-------------------:|:------:|:------:|:--------:|:---------:|:-------:|:------:|:------------:|:-----------:|:-------------:|:-------------:|:----------:| |
|
| 0.3984 | 1.0 | 2393 | 0.5120 | 0.7268 | 0.7819 | 0.7533 | 0.8741 | 0.9265 | 0.9819 | 0.8237 | 0.5053 | 0.2315 | 0.8197 | 0.7840 | 0.4886 | 0.8657 | 0.6338 | 0.8775 | 0.8575 | 0.7152 | 0.4533 | 0.0959 | 0.0 | 0.6480 | 0.7621 | 0.1884 | 0.9840 | 1.0 | 0.6194 | 0.8740 | 0.6610 | 0.9642 | 0.9039 | 0.0 | 0.8500 | 0.0220 | 0.6325 | 0.7840 | 0.6899 | 0.7667 | 0.0 | 0.2966 | 0.0 | 0.3682 | 0.9986 | 0.9387 | 0.8558 | 0.9879 | 0.9687 | 0.7455 | 0.9252 | 0.9661 | 0.9110 | 0.9771 | 0.5282 | 0.7988 | 0.8453 | 0.9648 | 0.9804 | 0.9356 | 0.7741 | 0.6780 | 0.7915 | |
|
| 0.2097 | 2.0 | 4786 | 0.1406 | 0.8392 | 0.8913 | 0.8645 | 0.9509 | 0.9760 | 0.9114 | 0.9227 | 0.7647 | 0.9190 | 0.9554 | 0.8975 | 0.8881 | 0.9535 | 0.8414 | 0.9114 | 0.9820 | 0.8503 | 0.7525 | 0.6171 | 0.0077 | 0.8787 | 0.3161 | 0.2847 | 0.9924 | 0.9918 | 0.9495 | 0.9076 | 0.9625 | 0.9890 | 0.9870 | 0.0 | 0.8484 | 0.8007 | 0.8651 | 0.9660 | 0.9164 | 0.8695 | 0.8756 | 0.9685 | 0.7768 | 0.6697 | 0.9956 | 0.9754 | 0.9652 | 0.9976 | 0.9849 | 0.7977 | 0.9373 | 0.9923 | 0.9815 | 0.9828 | 0.8093 | 0.9445 | 0.9735 | 0.9933 | 0.9651 | 0.9854 | 0.9843 | 0.975 | 0.8123 | |
|
| 0.1271 | 3.0 | 7179 | 0.1049 | 0.9218 | 0.9312 | 0.9265 | 0.9618 | 0.9950 | 0.9880 | 0.9172 | 0.9309 | 0.9652 | 0.8222 | 0.9160 | 0.9364 | 0.9749 | 0.9556 | 0.9211 | 0.9856 | 0.8939 | 0.8237 | 0.76 | 0.0080 | 0.9360 | 0.8735 | 0.5567 | 0.9993 | 0.9973 | 0.9872 | 0.9547 | 0.9773 | 0.9574 | 0.9694 | 0.0 | 0.8510 | 0.8032 | 0.9404 | 0.9844 | 0.9522 | 0.9294 | 0.8584 | 1.0 | 0.8603 | 0.8908 | 1.0 | 0.9829 | 0.9513 | 1.0 | 0.9792 | 0.8579 | 0.9413 | 0.9968 | 0.9513 | 0.9929 | 0.9278 | 0.9484 | 0.9862 | 0.9940 | 0.8884 | 0.9943 | 0.9616 | 0.9648 | 0.9395 | |
|
| 0.1345 | 4.0 | 9572 | 0.0941 | 0.9463 | 0.9580 | 0.9521 | 0.9659 | 0.9975 | 0.9979 | 0.9356 | 0.9597 | 0.9084 | 0.9569 | 0.9827 | 0.9734 | 0.9835 | 0.9780 | 0.9634 | 0.9904 | 0.9393 | 0.8542 | 0.8915 | 0.4069 | 0.9636 | 0.8873 | 0.6572 | 0.9993 | 1.0 | 0.9923 | 0.9796 | 0.9983 | 0.9917 | 0.9972 | 0.0 | 0.8515 | 0.8027 | 0.9689 | 0.9943 | 0.9685 | 0.9668 | 0.8162 | 0.9912 | 0.9110 | 0.9364 | 1.0 | 0.9848 | 0.9734 | 0.9976 | 0.9949 | 0.9739 | 0.9609 | 0.9968 | 0.9906 | 0.9899 | 0.9772 | 0.9875 | 0.9855 | 0.9978 | 1.0 | 0.9972 | 0.9867 | 0.9817 | 0.9780 | |
|
| 0.1067 | 5.0 | 11965 | 0.0724 | 0.9556 | 0.9659 | 0.9607 | 0.9699 | 0.9967 | 0.9965 | 0.9705 | 0.9742 | 0.9892 | 0.9736 | 0.9891 | 0.9794 | 0.9951 | 0.9860 | 0.9897 | 0.9892 | 0.9517 | 0.8386 | 0.9770 | 0.4186 | 0.9822 | 0.8869 | 0.7016 | 1.0 | 1.0 | 0.9949 | 0.9859 | 0.9983 | 1.0 | 0.9954 | 0.0075 | 0.8569 | 0.8012 | 0.9819 | 0.9979 | 0.9856 | 0.9843 | 0.9383 | 1.0 | 0.9318 | 0.9461 | 1.0 | 0.9905 | 1.0 | 1.0 | 0.9978 | 0.9906 | 0.9646 | 0.9981 | 0.9924 | 0.9970 | 0.9862 | 0.9966 | 0.9951 | 0.9970 | 1.0 | 0.9981 | 0.9933 | 1.0 | 0.9913 | |
|
| 0.0808 | 6.0 | 14358 | 0.0693 | 0.9664 | 0.9732 | 0.9698 | 0.9728 | 1.0 | 1.0 | 0.9760 | 0.9897 | 0.9978 | 0.9907 | 0.9906 | 0.9930 | 0.9994 | 0.9939 | 1.0 | 0.9891 | 0.9590 | 0.9052 | 0.9875 | 0.7022 | 0.9892 | 0.9126 | 0.7438 | 1.0 | 1.0 | 1.0 | 0.9934 | 0.9991 | 1.0 | 1.0 | 0.1551 | 0.8393 | 0.8034 | 0.9942 | 0.9993 | 0.9928 | 0.9877 | 0.9770 | 1.0 | 0.9451 | 0.9773 | 1.0 | 0.9924 | 1.0 | 1.0 | 1.0 | 0.9929 | 0.9722 | 0.9974 | 0.9949 | 0.9970 | 0.9941 | 0.9972 | 0.9967 | 1.0 | 1.0 | 0.9991 | 1.0 | 1.0 | 0.9890 | |
|
| 0.0779 | 7.0 | 16751 | 0.0697 | 0.9698 | 0.9756 | 0.9727 | 0.9739 | 0.9983 | 1.0 | 0.9815 | 0.9904 | 1.0 | 0.9938 | 0.9935 | 0.9930 | 0.9994 | 0.9935 | 1.0 | 0.9903 | 0.9584 | 0.9206 | 0.9917 | 0.7753 | 0.9914 | 0.9315 | 0.8305 | 1.0 | 1.0 | 1.0 | 0.9939 | 1.0 | 1.0 | 1.0 | 0.1404 | 0.8382 | 0.8029 | 0.9958 | 1.0 | 0.9944 | 0.9910 | 0.9875 | 1.0 | 0.9480 | 0.9788 | 1.0 | 0.9924 | 1.0 | 1.0 | 1.0 | 0.9929 | 0.9747 | 0.9961 | 0.9949 | 0.9970 | 0.9925 | 0.9983 | 0.9967 | 1.0 | 1.0 | 0.9991 | 1.0 | 1.0 | 0.9953 | |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.35.2 |
|
- Pytorch 2.1.0+cu118 |
|
- Datasets 2.15.0 |
|
- Tokenizers 0.15.0 |