dougtrajano commited on
Commit
a21227a
1 Parent(s): 3b673f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -23
README.md CHANGED
@@ -23,31 +23,63 @@ should probably proofread and complete it, then remove this comment. -->
23
 
24
  # dougtrajano/toxicity-target-classification
25
 
26
- This model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the OLID-BR dataset.
27
- It achieves the following results on the evaluation set:
28
- - Loss: 0.6110
29
- - Accuracy: 0.6864
30
- - F1: 0.6872
31
- - Precision: 0.6882
32
- - Recall: 0.6864
33
 
34
- ## Model description
35
 
36
- More information needed
37
 
38
- ## Intended uses & limitations
39
 
40
- More information needed
41
 
42
- ## Training and evaluation data
43
 
44
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  ## Training procedure
47
 
48
  ### Training hyperparameters
49
 
50
  The following hyperparameters were used during training:
 
51
  - learning_rate: 4.174021560583183e-05
52
  - train_batch_size: 8
53
  - eval_batch_size: 8
@@ -57,19 +89,13 @@ The following hyperparameters were used during training:
57
  - num_epochs: 30
58
  - label_smoothing_factor: 0.09936835309930625
59
 
60
- ### Training results
61
-
62
- | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
63
- |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|:---------:|:------:|
64
- | 0.633 | 1.0 | 537 | 0.6040 | 0.6919 | 0.5805 | 0.6351 | 0.6919 |
65
- | 0.5915 | 2.0 | 1074 | 0.6110 | 0.6864 | 0.6872 | 0.6882 | 0.6864 |
66
- | 0.4584 | 3.0 | 1611 | 0.7104 | 0.6933 | 0.6606 | 0.6605 | 0.6933 |
67
- | 0.3564 | 4.0 | 2148 | 0.9816 | 0.6168 | 0.6307 | 0.6671 | 0.6168 |
68
-
69
-
70
  ### Framework versions
71
 
72
  - Transformers 4.26.0
73
  - Pytorch 1.10.2+cu113
74
  - Datasets 2.9.0
75
  - Tokenizers 0.13.2
 
 
 
 
 
23
 
24
  # dougtrajano/toxicity-target-classification
25
 
26
+ Toxicity Target Classification is a model that classifies if a given text is targeted or not.
 
 
 
 
 
 
27
 
28
+ This BERT model is a fine-tuned version of [neuralmind/bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) on the [OLID-BR dataset](https://huggingface.co/datasets/dougtrajano/olid-br).
29
 
30
+ ## Overview
31
 
32
+ **Input:** Text in Brazilian Portuguese
33
 
34
+ **Output:** Binary classification (targeted or untargeted)
35
 
36
+ ## Usage
37
 
38
+ ```python
39
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
40
+
41
+ tokenizer = AutoTokenizer.from_pretrained("dougtrajano/toxicity-target-classification")
42
+
43
+ model = AutoModelForSequenceClassification.from_pretrained("dougtrajano/toxicity-target-classification")
44
+ ```
45
+
46
+ ## Limitations and bias
47
+
48
+ The following factors may degrade the model’s performance.
49
+
50
+ **Text Language**: The model was trained on Brazilian Portuguese texts, so it may not work well with Portuguese dialects.
51
+
52
+ **Text Origin**: The model was trained on texts from social media and a few texts from other sources, so it may not work well on other types of texts.
53
+
54
+ ## Trade-offs
55
+
56
+ Sometimes models exhibit performance issues under particular circumstances. In this section, we'll discuss situations in which you might discover that the model performs less than optimally, and should plan accordingly.
57
+
58
+ **Text Length**: The model was fine-tuned on texts with a word count between 1 and 178 words (average of 18 words). It may give poor results on texts with a word count outside this range.
59
+
60
+ ## Performance
61
+
62
+ The model was evaluated on the test set of the [OLID-BR](https://dougtrajano.github.io/olid-br/) dataset.
63
+
64
+ **Accuracy:** 0.6864
65
+
66
+ **Precision:** 0.6882
67
+
68
+ **Recall:** 0.6864
69
+
70
+ **F1-Score:** 0.6872
71
+
72
+ | Class | Precision | Recall | F1-Score | Support |
73
+ | :---: | :-------: | :----: | :------: | :-----: |
74
+ | `UNTARGETED` | 0.4912 | 0.5011 | 0.4961 | 443 |
75
+ | `TARGETED INSULT` | 0.7759 | 0.7688 | 0.7723 | 995 |
76
 
77
  ## Training procedure
78
 
79
  ### Training hyperparameters
80
 
81
  The following hyperparameters were used during training:
82
+
83
  - learning_rate: 4.174021560583183e-05
84
  - train_batch_size: 8
85
  - eval_batch_size: 8
 
89
  - num_epochs: 30
90
  - label_smoothing_factor: 0.09936835309930625
91
 
 
 
 
 
 
 
 
 
 
 
92
  ### Framework versions
93
 
94
  - Transformers 4.26.0
95
  - Pytorch 1.10.2+cu113
96
  - Datasets 2.9.0
97
  - Tokenizers 0.13.2
98
+
99
+ ## Provide Feedback
100
+
101
+ If you have any feedback on this model, please [open an issue](https://github.com/DougTrajano/ToChiquinho/issues/new) on GitHub.