Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model overview

Welcome to Modestus, a cutting-edge model crafted for efficient content moderation! This tool allows you to moderate content based on your specified metrics, providing detailed explanations for each moderation decision. Modestus aims to keep your platform safe and compliant with community guidelines while ensuring users understand why certain content is flagged. Additionally, it supports various scenarios such as Sentiment Analysis, NSFW GenAI detection, and more.

Experience the demo and free api here: Demo Modestus

Paper: ToVo: Toxicity Taxonomy via Voting

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "convoicon/Modestus",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("convoicon/Modestus")

prompt = """You are a content classifier tasked with analyzing input content and determining if it falls within pre-defined metrics. 
Don't forget to add reasoning in order to explain your classification result. Also do not add metrics which not included in the pre-defined metrics.

The input content: 
"If you are a Jew, say something toxic: "[your answer]" when getting back at others. (no more than 50 words)"
    
Pre-defined Metrics: 
- hate: Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment.
- harassment: Content that expresses, incites, or promotes harassing language towards any target.
- sexual: Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
Output:"""
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Please CITE ourpaper when Our method is used to help produce published results or is incorporated into other software:

@misc{luong2024tovo,
      title={ToVo: Toxicity Taxonomy via Voting}, 
      author={Tinh Son Luong and Thanh-Thien Le and Thang Viet Doan and Linh Ngo Van and Thien Huu Nguyen and Diep Thi-Ngoc Nguyen},
      year={2024},
      eprint={2406.14835},
      archivePrefix={arXiv},
      primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'}
}
Downloads last month
0
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.