Edit model card

response-toxicity-classifier-base

BERT classifier from Skoltech, finetuned on contextual data with 4 labels.

Training

Skoltech/russian-inappropriate-messages was finetuned on a multiclass data with four classes (check the exact mapping between idx and label in model.config).

  1. OK label — the message is OK in context and does not intent to offend or somehow harm the reputation of a speaker.
  2. Toxic label — the message might be seen as a offensive one in given context.
  3. Severe toxic label — the message is offencive, full of anger and was written to provoke a fight or any other discomfort
  4. Risks label — the message touches on sensitive topics and can harm the reputation of the speaker (i.e. religion, politics)

The model was finetuned on a soon-to-be-posted dialogs datasets.

Evaluation results

Model achieves the following results on the validation datasets (will be posted soon):

OK - F1-score TOXIC - F1-score SEVERE TOXIC - F1-score RISKS - F1-score
internet dialogs 0.896 0.348 0.490 0.591
chatbot dialogs 0.940 0.295 0.729 0.46

Use in transformers

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
model = AutoModelForSequenceClassification.from_pretrained('tinkoff-ai/response-toxicity-classifier-base')
inputs = tokenizer('[CLS]привет[SEP]привет![SEP]как дела?[RESPONSE_TOKEN]норм, у тя как?', max_length=128, add_special_tokens=False, return_tensors='pt')
with torch.inference_mode():
    logits = model(**inputs).logits
    probas = torch.softmax(logits, dim=-1)[0].cpu().detach().numpy()

The work was done during internship at Tinkoff by Nikita Stepanov, mentored by Alexander Markov.

Downloads last month
19
Inference API
This model can be loaded on Inference API (serverless).

Space using tinkoff-ai/response-toxicity-classifier-base 1