metadata
license: mit
language: es
tags:
- spanish
metrics:
- ROC-AUC
widget:
- text: Sos pero bien imbécil!
- text: Tirate de un puente!
- text: sapo, gonorrea de mierda
- text: Esta perrita me las va pagar
colombian-spanish-cyberbullying-classifier
This model is a fine-tuned version of PlanTL-GOB-ES/roberta-base-bne on a dataset created by manually gathering posts from the social network Twitter to detect cyberbullying in Spanish.
Training and evaluation data
The dataset used was a small one, consisting of 3570 tweets, which were manually labeled as cyberbullying or not cyberbullying. The distribution of tweets and of cyberbullying and non-cyberbullying was the same.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- weight_decay=0.01
- warmup_steps=500
- num_epochs: 2
Training results
Training Loss | Epoch | ROC-AUC | Validation Loss |
---|---|---|---|
--- | 1.0 | 0.8756 | 0.4375 |
0.4945 | 2.0 | 0.9022 | 0.5060 |
Model in action 🚀
Fast usage with pipelines:
!pip install -q transformers
from transformers import pipeline
model_path = "FelipeGuerra/colombian-spanish-cyberbullying-classifier"
bullying_analysis = pipeline("text-classification", model=model_path, tokenizer=model_path)
bullying_analysis(
"Como dice mi mamá: va caer palo de agua"
)
# Output:
[{'label': 'Not_bullying', 'score': 0.977687656879425}]
bullying_analysis(
"Esta perrita me las va pagar"
)
# Output:
[{'label': 'Bullying', 'score': 0.9404164552688599}]
Framework versions
- Transformers 4.34.0
- Pytorch 2.0.1+cu118
- Pandas 1.5.3
- scikit-learn 1.2.2
Created by Felipe Guerra Sáenz| LinkedIn