Tyler Ashworth commited on
Commit
e899a89
1 Parent(s): 2f94bb2

Initial commit

Browse files
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ metrics:
5
+ - f1
6
+ - accuracy
7
+ pipeline_tag: text-classification
8
+ widget:
9
+ - text: "Every woman wants to be a model. It's codeword for 'I get everything for free and people want me'"
10
+ ---
11
+ ### distilbert-base-sexism-detector
12
+ This is a fine-tuned model of distilbert-base on the Explainable Detection of Online Sexism (EDOS) dataset. It is intended to be used as a classification model for identifying tweets (0 - not sexist; 1 - sexist).
13
+
14
+ **This is a light model with an 81.2 F1 score. Use this model for fase prediction using the online API, if you like to see our best model with 86.3 F1 score , use this [link](https://huggingface.co/NLP-LTU/BERTweet-large-sexism-detector).**
15
+
16
+ Classification examples (use these example in the Hosted Inference API in the right panel ):
17
+
18
+ |Prediction|Tweet|
19
+ |-----|--------|
20
+ |sexist |Every woman wants to be a model. It's codeword for "I get everything for free and people want me" |
21
+ |not sexist |basically I placed more value on her than I should then?|
22
+ # More Details
23
+ For more details about the datasets and eval results, see (we will updated the page with our paper link)
24
+ # How to use
25
+ ```python
26
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer,pipeline
27
+ import torch
28
+ model = AutoModelForSequenceClassification.from_pretrained('NLP-LTU/distilbert-sexism-detector')
29
+ tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
30
+ classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
31
+ prediction=classifier("Every woman wants to be a model. It's codeword for 'I get everything for free and people want me' ")
32
+ label_pred = 'not sexist' if prediction == 0 else 'sexist'
33
+
34
+ print(label_pred)
35
+
36
+ ```
37
+ ```
38
+ precision recall f1-score support
39
+
40
+ not sexsit 0.9000 0.9264 0.9130 3030
41
+ sexist 0.7469 0.6784 0.7110 970
42
+
43
+ accuracy 0.8662 4000
44
+ macro avg 0.8234 0.8024 0.8120 4000
45
+ weighted avg 0.8628 0.8662 0.8640 4000
46
+
47
+ ```
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "distilbert-base-uncased-best-train-dev",
3
+ "activation": "gelu",
4
+ "architectures": [
5
+ "DistilBertForSequenceClassification"
6
+ ],
7
+ "attention_dropout": 0.1,
8
+ "dim": 768,
9
+ "dropout": 0.1,
10
+ "hidden_dim": 3072,
11
+ "id2label": {
12
+ "0": "not sexist",
13
+ "1": "sexist"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "label2id": {
17
+ "not sexist": 0,
18
+ "sexist": 1
19
+ },
20
+ "max_position_embeddings": 512,
21
+ "model_type": "distilbert",
22
+ "n_heads": 12,
23
+ "n_layers": 6,
24
+ "pad_token_id": 0,
25
+ "problem_type": "single_label_classification",
26
+ "qa_dropout": 0.1,
27
+ "seq_classif_dropout": 0.2,
28
+ "sinusoidal_pos_embds": false,
29
+ "tie_weights_": true,
30
+ "torch_dtype": "float32",
31
+ "transformers_version": "4.22.2",
32
+ "vocab_size": 30522
33
+ }
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e1995f398a4ca179202ed8e92d78ba758857b8155a1280305adcf821c61d69f
3
+ size 267933570
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a717035cc3023d13b2c4bcb928442181327fed40de336c80caab21a14f4455f6
3
+ size 267852913
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "do_lower_case": true,
4
+ "mask_token": "[MASK]",
5
+ "model_max_length": 512,
6
+ "name_or_path": "distilbert-base-uncased",
7
+ "pad_token": "[PAD]",
8
+ "sep_token": "[SEP]",
9
+ "special_tokens_map_file": null,
10
+ "strip_accents": null,
11
+ "tokenize_chinese_chars": true,
12
+ "tokenizer_class": "DistilBertTokenizer",
13
+ "unk_token": "[UNK]"
14
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:353096663285e7ec1e521b466139f951bf625aa156dd1ca96c17e60636b29d57
3
+ size 3375
vocab.txt ADDED
The diff for this file is too large to render. See raw diff