FelipeGuerra's picture
Update README.md
45d0828
|
raw
history blame
2.05 kB
metadata
license: mit
language: es
tags:
  - spanish
metrics:
  - ROC-AUC
widget:
  - text: Sos pero bien imbécil!
  - text: Tirate de un puente!
  - text: sapo, gonorrea de mierda
  - text: Esta perrita me las va pagar

colombian-spanish-cyberbullying-classifier

This model is a fine-tuned version of PlanTL-GOB-ES/roberta-base-bne on a dataset created by manually gathering posts from the social network Twitter to detect cyberbullying in Spanish.

Training and evaluation data

The dataset used was a small one, consisting of 3570 tweets, which were manually labeled as cyberbullying or not cyberbullying. The distribution of tweets and of cyberbullying and non-cyberbullying was the same.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • weight_decay=0.01
  • warmup_steps=500
  • num_epochs: 2

Training results

Training Loss Epoch ROC-AUC Validation Loss
--- 1.0 0.8756 0.4375
0.4945 2.0 0.9022 0.5060

Model in action 🚀

Fast usage with pipelines:

!pip install -q transformers
from transformers import pipeline

model_path = "FelipeGuerra/colombian-spanish-cyberbullying-classifier"
bullying_analysis = pipeline("text-classification", model=model_path, tokenizer=model_path)

bullying_analysis(
    "Como dice mi mamá: va caer palo de agua"
    )

# Output:
[{'label': 'Not_bullying', 'score': 0.977687656879425}]

bullying_analysis(
    "Esta perrita me las va pagar"
    )
# Output:
[{'label': 'Bullying', 'score': 0.9404164552688599}] 
    

Framework versions

  • Transformers 4.34.0
  • Pytorch 2.0.1+cu118
  • Pandas 1.5.3
  • scikit-learn 1.2.2

Created by Felipe Guerra Sáenz| LinkedIn