Edit model card

response-toxicity-classifier-base

BERT classifier from Skoltech, finetuned on contextual data with 4 labels.

Training

Skoltech/russian-inappropriate-messages was finetuned on a multiclass data with four classes (check the exact mapping between idx and label in model.config).

  1. OK label — the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
  2. Toxic label — the message might be seen as a offensive one in given context.
  3. Severe toxic label — the message is offencive, full of anger and was written to provoke a fight or any other discomfort
  4. Risks label — the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics)

The model was finetuned on a soon-to-be-posted dialogs datasets.

Evaluation results

Model achieves the following results on the validation datasets (will be posted soon):

OK - F1-score TOXIC - F1-score SEVERE TOXIC - F1-score RISKS - F1-score
internet dialogs 0.896 0.348 0.490 0.591
chatbot dialogs 0.940 0.295 0.729 0.46

Use in transformers

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
with torch.inference_mode():
    logits = model(**inputs).logits
    probas = torch.softmax(logits, dim=-1)[0].cpu().detach().numpy()

The work was done during internship at Tinkoff by Nikita Stepanov, mentored by Alexander Markov.

Downloads last month
40
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using t-bank-ai/response-toxicity-classifier-base 1