rcds
/

tbrugger commited on
Commit
a0d186f
1 Parent(s): 012ee79

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Citation
2
+
3
+ ```
4
+ @inproceedings{10.1145/3594536.3595132,
5
+ author = {Brugger, Tobias and St\"{u}rmer, Matthias and Niklaus, Joel},
6
+ title = {MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset},
7
+ year = {2023},
8
+ isbn = {9798400701979},
9
+ publisher = {Association for Computing Machinery},
10
+ address = {New York, NY, USA},
11
+ url = {https://doi.org/10.1145/3594536.3595132},
12
+ doi = {10.1145/3594536.3595132},
13
+ abstract = {Sentence Boundary Detection (SBD) is one of the foundational building blocks of Natural Language Processing (NLP), with incorrectly split sentences heavily influencing the output quality of downstream tasks. It is a challenging task for algorithms, especially in the legal domain, considering the complex and different sentence structures used. In this work, we curated a diverse multilingual legal dataset consisting of over 130'000 annotated sentences in 6 languages. Our experimental results indicate that the performance of existing SBD models is subpar on multilingual legal data. We trained and tested monolingual and multilingual models based on CRF, BiLSTM-CRF, and transformers, demonstrating state-of-the-art performance. We also show that our multilingual models outperform all baselines in the zero-shot setting on a Portuguese test set. To encourage further research and development by the community, we have made our dataset, models, and code publicly available.},
14
+ booktitle = {Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law},
15
+ pages = {42–51},
16
+ numpages = {10},
17
+ keywords = {Natural Language Processing, Sentence Boundary Detection, Text Annotation, Legal Document Analysis, Multilingual},
18
+ location = {Braga, Portugal},
19
+ series = {ICAIL '23}
20
+ }
21
+ ```