This model card documents the demo paper "PEACE: Providing Explanations and Analysis for Combating Hate Expressions" accepted at the 27th European Conference on Artificial Intelligence: https://www.ecai2024.eu/calls/demos.
The Model
This model is a hate speech detector fine-tuned specifically for detecting implicit hate speech. It is based on the paper "PEACE: Providing Explanations and Analysis for Combating Hate Expressions" by Greta Damo, Nicolás Benjamín Ocampo, Elena Cabrio, and Serena Villata, presented at the 27th European Conference on Artificial Intelligence.
Training Parameters and Experimental Info
The model was trained using the ISHate dataset, focusing on implicit data. Training parameters included:
- Batch size: 32
- Weight decay: 0.01
- Epochs: 4
- Learning rate: 2e-5
For detailed information on the training process, please refer to the model's paper.
Usage
First you might need the transformers version 4.30.2.
pip install transformers==4.30.2
This model was created using pytorch vanilla. In order to load it you have to use the following Model Class.
class ContrastiveModel(nn.Module):
def __init__(self, model):
super(ContrastiveModel, self).__init__()
self.model = model
self.embedding_dim = model.config.hidden_size
self.fc = nn.Linear(self.embedding_dim, self.embedding_dim)
self.classifier = nn.Linear(self.embedding_dim, 2) # Classification layer
def forward(self, input_ids, attention_mask):
outputs = self.model(input_ids, attention_mask)
embeddings = outputs.last_hidden_state[:, 0] # Use the CLS token embedding as the representation
embeddings = self.fc(embeddings)
logits = self.classifier(embeddings) # Apply classification layer
return embeddings, logits
Then, we instantiate the model as:
from transformers import AutoModel, AutoTokenizer, AutoConfig
repo_name = "BenjaminOcampo/peace_cont_bert"
config = AutoConfig.from_pretrained(repo_name)
contrastive_model = ContrastiveModel(AutoModel.from_config(config))
tokenizer = AutoTokenizer.from_pretrained(repo_name)
Finally, to load the weights of the model we do as follows:
model_tmp_file = hf_hub_download(repo_id=repo_name, filename="model.pt", token=read_token)
state_dict = torch.load(model_tmp_file)
contrastive_model.load_state_dict(state_dict)
You can make predictions as any pytorch model:
import torch
text = "Are you sure that Islam is a peaceful religion?"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
_, logits = contrastive_model(inputs["input_ids"], inputs["attention_mask"])
probabilities = torch.softmax(logits, dim=1)
_, predicted_labels = torch.max(probabilities, dim=1)
Datasets
The model was trained on the ISHate dataset, specifically the training part of the dataset which focuses on implicit hate speech.
Evaluation Results
The model's performance was evaluated using standard metrics, including F1 score and accuracy. For comprehensive evaluation results, refer to the linked paper.
Authors:
- Downloads last month
- 169