Isotonic
/

deberta-v3-base_finetuned_ai4privacy_v2

@@ -6,6 +6,13 @@ tags:
 model-index:
 - name: deberta-v3-base_finetuned_ai4privacy_v2
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -13,13 +20,51 @@ should probably proofread and complete it, then remove this comment. -->
 # deberta-v3-base_finetuned_ai4privacy_v2
-This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0211
 - Overall Precision: 0.9722
 - Overall Recall: 0.9792
 - Overall F1: 0.9757
 - Overall Accuracy: 0.9915
 - Accountname F1: 0.9993
 - Accountnumber F1: 0.9986
 - Age F1: 0.9884
@@ -77,33 +122,6 @@ It achieves the following results on the evaluation set:
 - Vehiclevrm F1: 0.9870
 - Zipcode F1: 0.9966
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5e-05
-- train_batch_size: 16
-- eval_batch_size: 16
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine_with_restarts
-- lr_scheduler_warmup_ratio: 0.2
-- num_epochs: 10
-- mixed_precision_training: Native AMP
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | Accountname F1 | Accountnumber F1 | Age F1 | Amount F1 | Bic F1 | Bitcoinaddress F1 | Buildingnumber F1 | City F1 | Companyname F1 | County F1 | Creditcardcvv F1 | Creditcardissuer F1 | Creditcardnumber F1 | Currency F1 | Currencycode F1 | Currencyname F1 | Currencysymbol F1 | Date F1 | Dob F1 | Email F1 | Ethereumaddress F1 | Eyecolor F1 | Firstname F1 | Gender F1 | Height F1 | Iban F1 | Ip F1  | Ipv4 F1 | Ipv6 F1 | Jobarea F1 | Jobtitle F1 | Jobtype F1 | Lastname F1 | Litecoinaddress F1 | Mac F1 | Maskednumber F1 | Middlename F1 | Nearbygpscoordinate F1 | Ordinaldirection F1 | Password F1 | Phoneimei F1 | Phonenumber F1 | Pin F1 | Prefix F1 | Secondaryaddress F1 | Sex F1 | Ssn F1 | State F1 | Street F1 | Time F1 | Url F1 | Useragent F1 | Username F1 | Vehiclevin F1 | Vehiclevrm F1 | Zipcode F1 |
@@ -125,4 +143,4 @@ The following hyperparameters were used during training:
 - Transformers 4.35.2
 - Pytorch 2.1.0+cu118
 - Datasets 2.15.0
-- Tokenizers 0.15.0

 model-index:
 - name: deberta-v3-base_finetuned_ai4privacy_v2
   results: []
+datasets:
+- ai4privacy/pii-masking-200k
+language:
+- en
+metrics:
+- seqeval
+pipeline_tag: token-classification
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # deberta-v3-base_finetuned_ai4privacy_v2
+This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on the [ai4privacy/pii-masking-200k](https://huggingface.co/ai4privacy/pii-masking-200k) dataset.
+## Useage
+GitHub Implementation: [Ai4Privacy](https://github.com/Sripaad/ai4privacy)
+## Model description
+This model has been finetuned on the World's largest open source privacy dataset.
+The purpose of the trained models is to remove personally identifiable information (PII) from text, especially in the context of AI assistants and LLMs.
+The example texts have 54 PII classes (types of sensitive data), targeting 229 discussion subjects / use cases split across business, education, psychology and legal fields, and 5 interactions styles (e.g. casual conversation, formal document, emails etc...).
+Take a look at the Github implementation for specific reasearch.
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 6e-04
+- train_batch_size: 16
+- eval_batch_size: 16
+- seed: 42
+- optimizer: Adam with betas=(0.96,0.996) and epsilon=1e-08
+- lr_scheduler_type: cosine_with_restarts
+- lr_scheduler_warmup_ratio: 0.2
+- num_epochs: 10
+- mixed_precision_training: Native AMP
+## Class wise metrics
 It achieves the following results on the evaluation set:
 - Loss: 0.0211
 - Overall Precision: 0.9722
 - Overall Recall: 0.9792
 - Overall F1: 0.9757
 - Overall Accuracy: 0.9915
 - Accountname F1: 0.9993
 - Accountnumber F1: 0.9986
 - Age F1: 0.9884
 - Vehiclevrm F1: 0.9870
 - Zipcode F1: 0.9966
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | Accountname F1 | Accountnumber F1 | Age F1 | Amount F1 | Bic F1 | Bitcoinaddress F1 | Buildingnumber F1 | City F1 | Companyname F1 | County F1 | Creditcardcvv F1 | Creditcardissuer F1 | Creditcardnumber F1 | Currency F1 | Currencycode F1 | Currencyname F1 | Currencysymbol F1 | Date F1 | Dob F1 | Email F1 | Ethereumaddress F1 | Eyecolor F1 | Firstname F1 | Gender F1 | Height F1 | Iban F1 | Ip F1  | Ipv4 F1 | Ipv6 F1 | Jobarea F1 | Jobtitle F1 | Jobtype F1 | Lastname F1 | Litecoinaddress F1 | Mac F1 | Maskednumber F1 | Middlename F1 | Nearbygpscoordinate F1 | Ordinaldirection F1 | Password F1 | Phoneimei F1 | Phonenumber F1 | Pin F1 | Prefix F1 | Secondaryaddress F1 | Sex F1 | Ssn F1 | State F1 | Street F1 | Time F1 | Url F1 | Useragent F1 | Username F1 | Vehiclevin F1 | Vehiclevrm F1 | Zipcode F1 |
 - Transformers 4.35.2
 - Pytorch 2.1.0+cu118
 - Datasets 2.15.0
+- Tokenizers 0.15.0