shainaraza commited on
Commit
149748c
1 Parent(s): 0f245cb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -0
README.md CHANGED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ ---
6
+ license: mit
7
+ language:
8
+ - en
9
+ ---
10
+
11
+ # Named entity recognition
12
+
13
+ ## Model Description
14
+
15
+ This model is a fine-tuned token classification model designed to predict entities in sentences.
16
+ It's fine-tuned on a custom dataset that focuses on identifying certain types of entities, including biases in text.
17
+
18
+ ## Intended Use
19
+
20
+ The model is intended to be used for entity recognition tasks, especially for identifying biases in text passages.
21
+ Users can input a sequence of text, and the model will highlight words or tokens or **spans** it believes are associated with a particular entity or bias.
22
+
23
+ ## How to Use
24
+
25
+ The model can be used for inference directly through the Hugging Face `transformers` library:
26
+
27
+ ```python
28
+
29
+
30
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
31
+ import torch
32
+
33
+ device = torch.device("cpu")
34
+
35
+ # Load model directly
36
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
37
+
38
+ tokenizer = AutoTokenizer.from_pretrained("newsmediabias/UnBIAS-NER")
39
+ model = AutoModelForTokenClassification.from_pretrained("newsmediabias/UnBIAS-NER")
40
+
41
+ def highlight_biased_entities(sentence):
42
+ tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sentence)))
43
+ inputs = tokenizer.encode(sentence, return_tensors="pt")
44
+ inputs = inputs.to(device)
45
+
46
+ outputs = model(inputs).logits
47
+ predictions = torch.argmax(outputs, dim=2)
48
+
49
+ id2label = model.config.id2label
50
+
51
+ # Reconstruct words from subword tokens and highlight them
52
+ highlighted_sentence = ""
53
+ current_word = ""
54
+ is_biased = False
55
+ for token, prediction in zip(tokens, predictions[0]):
56
+ label = id2label[prediction.item()]
57
+ if label in ['B-BIAS', 'I-BIAS']:
58
+ if token.startswith('##'):
59
+ current_word += token[2:]
60
+ else:
61
+ if current_word:
62
+ if is_biased:
63
+ highlighted_sentence += f"BIAS[{current_word}] "
64
+ else:
65
+ highlighted_sentence += f"{current_word} "
66
+ current_word = token
67
+ else:
68
+ current_word = token
69
+ is_biased = True
70
+ else:
71
+ if current_word:
72
+ if is_biased:
73
+ highlighted_sentence += f"BIAS[{current_word}] "
74
+ else:
75
+ highlighted_sentence += f"{current_word} "
76
+ current_word = ""
77
+ highlighted_sentence += f"{token} "
78
+ is_biased = False
79
+ if current_word:
80
+ if is_biased:
81
+ highlighted_sentence += f"BIAS[{current_word}]"
82
+ else:
83
+ highlighted_sentence += current_word
84
+
85
+ # Filter out special tokens and subword tokens
86
+ highlighted_sentence = highlighted_sentence.replace(' [', '[').replace(' ]', ']').replace(' ##', '')
87
+
88
+ return highlighted_sentence
89
+
90
+ sentence = "due to your evil and dishonest nature, i am kind of tired and want to get rid of such cheapters. all people like you are evil and a disgrace to society and I must say to get rid of immigrants as they are filthy to culture"
91
+ highlighted_sentence = highlight_biased_entities(sentence)
92
+ print(highlighted_sentence)
93
+
94
+
95
+
96
+
97
+ ```
98
+
99
+
100
+ ## Limitations and Biases
101
+
102
+ Every model has limitations, and it's crucial to understand these when deploying models in real-world scenarios:
103
+
104
+ 1. **Training Data**: The model is trained on a specific dataset, and its predictions are only as good as the data it's trained on.
105
+ 2. **Generalization**: While the model may perform well on certain types of sentences or phrases, it might not generalize well to all types of text or contexts.
106
+
107
+ It's also essential to be aware of any potential biases in the training data, which might affect the model's predictions.
108
+
109
+ ## Training Data
110
+
111
+ The model was fine-tuned on a custom dataset. Ask **Shaina Raza [email protected]** for dataset