File size: 1,069 Bytes
de8ae90
 
9c7652e
6c9e2be
9c7652e
 
e2278b5
6c9e2be
de8ae90
97aaa8b
9c7652e
97aaa8b
c55ab26
97aaa8b
9c7652e
97aaa8b
1f568dd
 
 
 
 
 
9c7652e
 
 
97aaa8b
 
 
 
 
9c7652e
8262225
 
 
 
 
 
 
 
 
 
 
97aaa8b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: openrail++
datasets:
- ukr-detect/ukr-toxicity-dataset-seminatural
language:
- uk
widget:
- text: Ти неймовірна!
---

## Binary toxicity classifier for Ukrainian

This is the fine-tuned on the semi-automatically collected [Ukrainian toxicity classification dataset](https://huggingface.co/datasets/ukr-detect/ukr-toxicity-dataset) ["xlm-roberta-base"](https://huggingface.co/xlm-roberta-base) instance. 

The evaluation metrics for binary toxicity classification on a test set are: 

| Metric    | Value |
|-----------|-------|
| F1-score  | 0.99  |
| Precision | 0.99  |
| Recall    | 0.99  |
| Accuracy  | 0.99  |


## How to use: 

```
from transformers import pipeline

classifier = pipeline("text-classification",
                       model="ukr-detect/ukr-toxicity-classifier")
```

## Citation

```
@article{dementieva2024toxicity,
  title={Toxicity Classification in Ukrainian},
  author={Dementieva, Daryna and Khylenko, Valeriia and Babakov, Nikolay and Groh, Georg},
  journal={arXiv preprint arXiv:2404.17841},
  year={2024}
}
```