README.md · hyacinthum/Piidgeon-ai4privacy at 614b34ef4218dd0b1edd334ad676025e59a83ab3

metadata

license: mit
datasets:
  - ai4privacy/pii-masking-400k
language:
  - en
  - de
  - fr
  - it
  - es
  - nl
base_model:
  - iiiorg/piiranha-v1-detect-personal-information
tags:
  - NeuralWave
  - Hackathon

Overview

This model serves to enhance the precision and accuracy of personal information detection by utilizing a reduced label set compared to its base model. Through this refinement, it aims to provide superior labeling precision for identifying personal information across multiple languages.

Features

Improved Precision: By reducing the label set size from the base model, the model enhances the precision of the labeling procedure, ensuring more reliable identification of sensitive information.
Model Versions:
Maximum Accuracy Focus: This version aims to achieve the highest possible accuracy in the detection process, making it suitable for applications where minimizing errors is crucial.
Maximum Precision Focus: This variant is designed to maximize the precision of the detection, ideal for scenarios where false positives are particularly undesirable.

Installation

To run this model, you will need to install the dependencies:

pip install torch transformers safetensors

Usage

Load and run the model using PyTorch and transformers:

import torch
from transformers import AutoTokenizer, AutoModel
from safetensors.torch import load_file

# Load the tokenizer
tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-multilingual-cased")

# Load the model
model = AutoModel.from_pretrained('model-path/miniagent.pt', device_map='auto')
# Alternatively, for the precision-focused model
# model = AutoModel.from_pretrained('model-path/miniagent_precision', device_map='auto')

# Example input
text = "Your sensitive information string"

# Tokenize and run the model
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# Process outputs for analysis...

Evaluation

Accuracy Model: Focused on minimizing errors, evaluates to achieve the highest accuracy metrics.
Precision Model: Designed to minimize false positives, optimizing for precision-driven applications.