dsai-wojciech-szmyd
commited on
Commit
•
27e7f59
1
Parent(s):
37cd418
Update README.md
Browse files
README.md
CHANGED
@@ -59,6 +59,29 @@ For fine-tuning to KLEJ tasks we used [Polish RoBERTa](https://github.com/sdadas
|
|
59 |
|
60 |
Our model achieved 1st place in cyberbullying detection (CBD) task in the [KLEJ leaderboard](https://klejbenchmark.com/leaderboard). Overall, it reached 7th place, just below HerBERT model.
|
61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
## Authors
|
63 |
|
64 |
Jakub Bartczuk, Krzysztof Dziedzic, Piotr Falkiewicz, Alicja Kotyla, Wojciech Szmyd, Michał Zobniów, Artur Zygadło
|
|
|
59 |
|
60 |
Our model achieved 1st place in cyberbullying detection (CBD) task in the [KLEJ leaderboard](https://klejbenchmark.com/leaderboard). Overall, it reached 7th place, just below HerBERT model.
|
61 |
|
62 |
+
## Citation
|
63 |
+
Please cite the following paper:
|
64 |
+
```
|
65 |
+
@inproceedings{szmyd-etal-2023-trelbert,
|
66 |
+
title = "{T}rel{BERT}: A pre-trained encoder for {P}olish {T}witter",
|
67 |
+
author = "Szmyd, Wojciech and
|
68 |
+
Kotyla, Alicja and
|
69 |
+
Zobni{\'o}w, Micha{\l} and
|
70 |
+
Falkiewicz, Piotr and
|
71 |
+
Bartczuk, Jakub and
|
72 |
+
Zygad{\l}o, Artur",
|
73 |
+
booktitle = "Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)",
|
74 |
+
month = may,
|
75 |
+
year = "2023",
|
76 |
+
address = "Dubrovnik, Croatia",
|
77 |
+
publisher = "Association for Computational Linguistics",
|
78 |
+
url = "https://aclanthology.org/2023.bsnlp-1.3",
|
79 |
+
pages = "17--24",
|
80 |
+
abstract = "Pre-trained Transformer-based models have become immensely popular amongst NLP practitioners. We present TrelBERT {--} the first Polish language model suited for application in the social media domain. TrelBERT is based on an existing general-domain model and adapted to the language of social media by pre-training it further on a large collection of Twitter data. We demonstrate its usefulness by evaluating it in the downstream task of cyberbullying detection, in which it achieves state-of-the-art results, outperforming larger monolingual models trained on general-domain corpora, as well as multilingual in-domain models, by a large margin. We make the model publicly available. We also release a new dataset for the problem of harmful speech detection.",
|
81 |
+
}
|
82 |
+
|
83 |
+
```
|
84 |
+
|
85 |
## Authors
|
86 |
|
87 |
Jakub Bartczuk, Krzysztof Dziedzic, Piotr Falkiewicz, Alicja Kotyla, Wojciech Szmyd, Michał Zobniów, Artur Zygadło
|