metadata

license: mit
language: es
tags:
  - spanish
metrics:
  - ROC-AUC
widget:
  - text: Sos pero bien imbécil!
  - text: Tirate de un puente!
  - text: sapo, gonorrea de mierda
  - text: Esta perrita me las va pagar

colombian-spanish-cyberbullying-classifier

This model is a fine-tuned version of PlanTL-GOB-ES/roberta-base-bne on a dataset created by manually gathering posts from the social network Twitter to detect cyberbullying in Spanish.

Training and evaluation data

The dataset used was a small one, consisting of 3570 tweets, which were manually labeled as cyberbullying or not cyberbullying. The distribution of tweets and of cyberbullying and non-cyberbullying was the same.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
weight_decay=0.01
warmup_steps=500
num_epochs: 2

Training results

Training Loss	Epoch	ROC-AUC	Validation Loss
---	1.0	0.8756	0.4375
0.4945	2.0	0.9022	0.5060

Model in action 🚀

Fast usage with pipelines:

!pip install -q transformers
from transformers import pipeline

model_path = "FelipeGuerra/colombian-spanish-cyberbullying-classifier"
bullying_analysis = pipeline("text-classification", model=model_path, tokenizer=model_path)

bullying_analysis(
    "Como dice mi mamá: va caer palo de agua"
    )

# Output:
[{'label': 'Not_bullying', 'score': 0.977687656879425}]

bullying_analysis(
    "Esta perrita me las va pagar"
    )
# Output:
[{'label': 'Bullying', 'score': 0.9404164552688599}]

Framework versions

Transformers 4.34.0
Pytorch 2.0.1+cu118
Pandas 1.5.3
scikit-learn 1.2.2

Created by Felipe Guerra Sáenz| LinkedIn