dsai-wojciech-szmyd commited on
Commit
27e7f59
1 Parent(s): 37cd418

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -59,6 +59,29 @@ For fine-tuning to KLEJ tasks we used [Polish RoBERTa](https://github.com/sdadas
59
 
60
  Our model achieved 1st place in cyberbullying detection (CBD) task in the [KLEJ leaderboard](https://klejbenchmark.com/leaderboard). Overall, it reached 7th place, just below HerBERT model.
61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  ## Authors
63
 
64
  Jakub Bartczuk, Krzysztof Dziedzic, Piotr Falkiewicz, Alicja Kotyla, Wojciech Szmyd, Michał Zobniów, Artur Zygadło
 
59
 
60
  Our model achieved 1st place in cyberbullying detection (CBD) task in the [KLEJ leaderboard](https://klejbenchmark.com/leaderboard). Overall, it reached 7th place, just below HerBERT model.
61
 
62
+ ## Citation
63
+ Please cite the following paper:
64
+ ```
65
+ @inproceedings{szmyd-etal-2023-trelbert,
66
+ title = "{T}rel{BERT}: A pre-trained encoder for {P}olish {T}witter",
67
+ author = "Szmyd, Wojciech and
68
+ Kotyla, Alicja and
69
+ Zobni{\'o}w, Micha{\l} and
70
+ Falkiewicz, Piotr and
71
+ Bartczuk, Jakub and
72
+ Zygad{\l}o, Artur",
73
+ booktitle = "Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)",
74
+ month = may,
75
+ year = "2023",
76
+ address = "Dubrovnik, Croatia",
77
+ publisher = "Association for Computational Linguistics",
78
+ url = "https://aclanthology.org/2023.bsnlp-1.3",
79
+ pages = "17--24",
80
+ abstract = "Pre-trained Transformer-based models have become immensely popular amongst NLP practitioners. We present TrelBERT {--} the first Polish language model suited for application in the social media domain. TrelBERT is based on an existing general-domain model and adapted to the language of social media by pre-training it further on a large collection of Twitter data. We demonstrate its usefulness by evaluating it in the downstream task of cyberbullying detection, in which it achieves state-of-the-art results, outperforming larger monolingual models trained on general-domain corpora, as well as multilingual in-domain models, by a large margin. We make the model publicly available. We also release a new dataset for the problem of harmful speech detection.",
81
+ }
82
+
83
+ ```
84
+
85
  ## Authors
86
 
87
  Jakub Bartczuk, Krzysztof Dziedzic, Piotr Falkiewicz, Alicja Kotyla, Wojciech Szmyd, Michał Zobniów, Artur Zygadło