---
license: mit
datasets:
- ai4privacy/pii-masking-400k
language:
- en
- de
- fr
- it
- es
- nl
base_model:
- iiiorg/piiranha-v1-detect-personal-information
tags:
- NeuralWave
- Hackathon
---
## Overview

This model serves to enhance the precision and accuracy of personal information detection by utilizing a reduced label set compared to its base model. Through this refinement, it aims to provide superior labeling precision for identifying personal information across multiple languages.

---

## Features

- **Improved Precision**: By reducing the label set size from the base model, the model enhances the precision of the labeling procedure, ensuring more reliable identification of sensitive information.

- **Model Versions**:
- **Maximum Accuracy Focus**: This version aims to achieve the highest possible accuracy in the detection process, making it suitable for applications where minimizing errors is crucial.
- **Maximum Precision Focus**: This variant is designed to maximize the precision of the detection, ideal for scenarios where false positives are particularly undesirable.

---

## Installation

To run this model, you will need to install the dependencies:

```bash
pip install torch transformers safetensors
```

---

## Usage


Load and run the model using PyTorch and transformers:

```python
from transformers import AutoModelForTokenClassification, AutoConfig, BertTokenizerFast
from safetensors.torch import load_file

# Load the config
config = AutoConfig.from_pretrained("folder_to_model")

# Initialize the model with the config
model = AutoModelForTokenClassification.from_config(config)

# Load the safetensors weights
state_dict = load_file("folder_to_tensors")

# Load the state dict into the model
model.load_state_dict(state_dict)

# Load the tokenizer
tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-multilingual-cased")

# Load the label mapper if needed
with open("pii_model/label_mapper.json", 'r') as f:
    label_mapper_data = json.load(f)

label_mapper = LabelMapper()
label_mapper.label_to_id = label_mapper_data['label_to_id']
label_mapper.id_to_label = {int(k): v for k, v in label_mapper_data['id_to_label'].items()}
label_mapper.num_labels = label_mapper_data['num_labels']

# Process outputs for analysis...
```

---

## Evaluation

- **Accuracy Model**: Focused on minimizing errors, evaluates to achieve the highest accuracy metrics.
- **Precision Model**: Designed to minimize false positives, optimizing for precision-driven applications.

---