|
--- |
|
language: |
|
- en |
|
pipeline_tag: text-classification |
|
license: mit |
|
metrics: |
|
- accuracy |
|
- f1 |
|
--- |
|
This model is part of the research presented in "Mitigating Toxicity in Dialogue Agents through Adversarial Reinforcement Learning," a conference paper addressing dialog agent toxicity by mitigating it at three levels: explicit, implicit, and contextual. It is a model capable of predicting toxicity given a history and a response to it. It is designed for dialog agents. To use it correctly, please follow the schematics below: |
|
|
|
[HST]Hi, how are you?[END]I am doing fine[ANS]I hope you die. |
|
|
|
The token [HST] initiates the history of the conversation, and each turn pair is separated by [END]. The token [ANS] indicates the start of the response to the last utterance. I will update this card, but right now, I am developing a bigger project with these, so I do not have the time to indicate all the results. |
|
|
|
The datasets used to train the model were the Dialogue Safety dataset and Bot Adversarial dataset. |