ptaszynski's picture
Update README.md
5663327
metadata
license: cc-by-4.0
datasets:
  - ptaszynski/PolishCyberbullyingDataset
language:
  - pl
tags:
  - cyberbullying
  - hate-speech

Polbert-CB - Polish BERT trained for Automatic Cyberbullying Detection

This is a Polish version of BERT language model, specifically, Polbert, trained on a re-annotated and improved Dataset for Automatic Cyberbullying Detection in Polish Laguage.

Fine-tuning dataset

The dataset used for fine-tuning this model was based on the original Dataset for Automatic Cyberbullying Detection in Polish Laguage, which was recently additionally cleaned and re-annotated by experts from Samurai Labs. The improved dataset and will be released separately later.

Acknowledgements

  • We would like to express our gratitude to the annotators of this dataset, including original annotators, and more recent expert annotators, for their invaluable time they spent on preparing the dataset.

Author

Michal Ptaszynski - contact me on:

Licences

The finetuned model with all attached files is licensed under CC BY-SA 4.0, or Creative Commons Attribution-ShareAlike 4.0 International License.

Creative Commons License

Citations

Please, cite this model using the following citation.

Model:

@article{ptaszynski2022cyberbullyibng-bert-pl,
  title={Polish BERT trained for Automatic Cyberbullying Detection},
  author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
  year={2022},
  publisher={HuggingFace},
  url={https://github.com/ptaszynski/bert-base-polish-cyberbullying}"
}

Original dataset:

@article{ptaszynski2019results,
  title={Results of the poleval 2019 shared task 6: First dataset and open shared task for automatic cyberbullying detection in polish twitter},
  author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dyba{\l}a, Pawe{\l}},
  year={2019},
  publisher={Warszawa: Institute of Computer Sciences. Polish Academy of Sciences}
}

Improved dataset:

The improved dataset used for training this model was released as follows. Expert-annotated dataset to study cyberbullying in Polish language

@article{ptaszynski2023expert,
  title={Expert-Annotated Dataset to Study Cyberbullying in Polish Language},
  author={Ptaszynski, Michal and Pieciukiewicz, Agata and Dybala, Pawel and Skrzek, Pawel and Soliwoda, Kamil and Fortuna, Marcin and Leliwa, Gniewosz and Wroczynski, Michal},
  journal={Data},
  volume={9},
  number={1},
  pages={1},
  year={2023},
  publisher={MDPI}
}

References