File size: 4,958 Bytes
698ae26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---

license: cc-by-nc-nd-4.0
language:
- en

model-index:
  - name: roberta-large Image Prompt Classifier
    results:
      - task:
          type: text-classification
        dataset:
          name: nsfw-text-detection
          type: custom
        metrics:
          - name: Accuracy
            type: self-reported
            value: 93%
          - name: Precision
            type: self-reported
            value: 88%
          - name: Recall
            type: self-reported
            value: 90%
---


# roberta-large Image Prompt Classifier

## Model Overview

This model is a fine-tuned version of `roberta-large` designed specifically for classifying image generation prompts into three distinct categories: SAFE, QUESTIONABLE, and UNSAFE. Leveraging the robust capabilities of the `roberta-large` architecture, this model ensures high accuracy and reliability in identifying the nature of prompts used for generating images.

## Model Details

- **Model Name:** roberta-large Image Prompt Classifier
- **Base Model:** [roberta-large](https://huggingface.co/roberta-large)
- **Fine-tuned By:** Michał Młodawski
- **Categories:**
  - `0`: SAFE
  - `1`: QUESTIONABLE
  - `2`: UNSAFE

## Use Cases

This model is particularly useful for platforms and applications involving AI-generated content, where it is crucial to filter and classify prompts to maintain content safety and appropriateness. Some potential applications include:

- **Content Moderation:** Automatically classify and filter prompts to prevent the generation of inappropriate or harmful images.
- **User Safety:** Enhance user experience by ensuring that generated content adheres to safety guidelines.
- **Compliance:** Help platforms comply with regulatory requirements by identifying and flagging potentially unsafe prompts.

## How It Works

The model takes an input prompt and classifies it into one of three categories:

1. **SAFE:** Prompts that are deemed appropriate and free from harmful content.
2. **QUESTIONABLE:** Prompts that may require further review due to potential ambiguity or slight risk.
3. **UNSAFE:** Prompts that are likely to generate inappropriate or harmful content.

The classification is based on the semantic understanding and contextual analysis provided by the `roberta-large` architecture, fine-tuned on a curated dataset tailored for this specific task.

## Performance

Extensive testing and validation have been conducted to ensure the model's performance meets high standards of accuracy and reliability. Key metrics include:

- **Accuracy:** 93%
- **Precision:** 88%
- **Recall:** 90%

These metrics highlight the model's capability to effectively distinguish between different categories of prompts with high confidence.

## Getting Started

To start using the roberta-large Image Prompt Classifier, you can easily integrate it into your existing workflows with the following steps:

### Installation

    pip install transformers==4.37.2

    pip install torch==2.3.1


### Usage

    import os

    from transformers import AutoTokenizer, AutoModelForSequenceClassification

    import torch

    

    

    model_repo = "MichalMlodawski/nsfw-text-detection-large"

    

    tokenizer = AutoTokenizer.from_pretrained(model_path)

    model = AutoModelForSequenceClassification.from_pretrained(model_path)

    

    def classify_text(text):

        inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)

        

        with torch.no_grad():

            outputs = model(**inputs)

        

        logits = outputs.logits

        predicted_class = torch.argmax(logits, dim=1).item()

        

        return predicted_class

    

    if __name__ == "__main__":

        examples = [

            {"text": "wild west cowboy caravan through Arizona Desert , Men Woman and children, horse drawn carriages", "expected_class": 0},

            {"text": "A person with questionable intentions in a dimly lit alley.", "expected_class": 1},

            {"text": "An inappropriate scene involving explicit content.", "expected_class": 2}

        ]

        

        for example in examples:

            text = example["text"]

            expected_class = example["expected_class"]

            predicted_class = classify_text(text)

            print(f"Prompt: {text}")

            print(f"Predicted class: {predicted_class}")

            print(f"Expected class: {expected_class}")

            print("---")


### Disclaimer

The dataset includes actual prompts for image generation that could be perceived as abusive, offensive, or obscene. Furthermore, the examples and data might contain unfavorable information about certain businesses. We merely collect this data and do not assume any legal responsibility.
Please note: A portion of the data was created using Large Language Models (LLM).