newsmediabias
/

UnBIAS-NER

+---
+license: mit
+---
+---
+license: mit
+language:
+- en
+---
+# Named entity recognition
+## Model Description
+This model is a fine-tuned token classification model designed to predict entities in sentences.
+It's fine-tuned on a custom dataset that focuses on identifying certain types of entities, including biases in text.
+## Intended Use
+The model is intended to be used for entity recognition tasks, especially for identifying biases in text passages.
+Users can input a sequence of text, and the model will highlight words or tokens or **spans** it believes are associated with a particular entity or bias.
+## How to Use
+The model can be used for inference directly through the Hugging Face `transformers` library:
+```python
+from transformers import AutoModelForTokenClassification, AutoTokenizer
+import torch
+device = torch.device("cpu")
+# Load model directly
+from transformers import AutoTokenizer, AutoModelForTokenClassification
+tokenizer = AutoTokenizer.from_pretrained("newsmediabias/UnBIAS-NER")
+model = AutoModelForTokenClassification.from_pretrained("newsmediabias/UnBIAS-NER")
+def highlight_biased_entities(sentence):
+    tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sentence)))
+    inputs = tokenizer.encode(sentence, return_tensors="pt")
+    inputs = inputs.to(device)
+    outputs = model(inputs).logits
+    predictions = torch.argmax(outputs, dim=2)
+    id2label = model.config.id2label
+    # Reconstruct words from subword tokens and highlight them
+    highlighted_sentence = ""
+    current_word = ""
+    is_biased = False
+    for token, prediction in zip(tokens, predictions[0]):
+        label = id2label[prediction.item()]
+        if label in ['B-BIAS', 'I-BIAS']:
+            if token.startswith('##'):
+                current_word += token[2:]
+            else:
+                if current_word:
+                    if is_biased:
+                        highlighted_sentence += f"BIAS[{current_word}] "
+                    else:
+                        highlighted_sentence += f"{current_word} "
+                    current_word = token
+                else:
+                    current_word = token
+                is_biased = True
+        else:
+            if current_word:
+                if is_biased:
+                    highlighted_sentence += f"BIAS[{current_word}] "
+                else:
+                    highlighted_sentence += f"{current_word} "
+                current_word = ""
+            highlighted_sentence += f"{token} "
+            is_biased = False
+    if current_word:
+        if is_biased:
+            highlighted_sentence += f"BIAS[{current_word}]"
+        else:
+            highlighted_sentence += current_word
+    # Filter out special tokens and subword tokens
+    highlighted_sentence = highlighted_sentence.replace(' [', '[').replace(' ]', ']').replace(' ##', '')
+    return highlighted_sentence
+sentence = "due to your evil and dishonest nature, i am kind of tired and want to get rid of such cheapters. all people like you are evil and a disgrace to society and I must say to get rid of immigrants as they are filthy to culture"
+highlighted_sentence = highlight_biased_entities(sentence)
+print(highlighted_sentence)
+```
+## Limitations and Biases
+Every model has limitations, and it's crucial to understand these when deploying models in real-world scenarios:
+1. **Training Data**: The model is trained on a specific dataset, and its predictions are only as good as the data it's trained on.
+2. **Generalization**: While the model may perform well on certain types of sentences or phrases, it might not generalize well to all types of text or contexts.
+It's also essential to be aware of any potential biases in the training data, which might affect the model's predictions.
+## Training Data
+The model was fine-tuned on a custom dataset. Ask **Shaina Raza [email protected]** for dataset