license: mit
datasets:
- ai4privacy/pii-masking-400k
language:
- en
- de
- fr
- it
- es
- nl
base_model:
- iiiorg/piiranha-v1-detect-personal-information
tags:
- NeuralWave
- Hackathon
Overview
This model serves to enhance the precision and accuracy of personal information detection by utilizing a reduced label set compared to its base model. Through this refinement, it aims to provide superior labeling precision for identifying personal information across multiple languages.
Features
Improved Precision: By reducing the label set size from the base model, the model enhances the precision of the labeling procedure, ensuring more reliable identification of sensitive information.
Model Versions:
Maximum Accuracy Focus: This version aims to achieve the highest possible accuracy in the detection process, making it suitable for applications where minimizing errors is crucial.
Maximum Precision Focus: This variant is designed to maximize the precision of the detection, ideal for scenarios where false positives are particularly undesirable.
Installation
To run this model, you will need to install the dependencies:
pip install torch transformers safetensors
Usage
Load and run the model using PyTorch and transformers:
import torch
from transformers import AutoTokenizer, AutoModel
from safetensors.torch import load_file
# Load the tokenizer
tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-multilingual-cased")
# Load the model
model = AutoModel.from_pretrained('model-path/miniagent.pt', device_map='auto')
# Alternatively, for the precision-focused model
# model = AutoModel.from_pretrained('model-path/miniagent_precision', device_map='auto')
# Example input
text = "Your sensitive information string"
# Tokenize and run the model
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
# Process outputs for analysis...
Evaluation
- Accuracy Model: Focused on minimizing errors, evaluates to achieve the highest accuracy metrics.
- Precision Model: Designed to minimize false positives, optimizing for precision-driven applications.