w11wo
/

sundanese-roberta-base

sundanese-roberta-base

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

w11wo commited on Feb 26, 2022

Commit

14f10ed

•

1 Parent(s): 228f4de

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -73,3 +73,21 @@ Do consider the biases which came from all four datasets that may be carried ove
 ## Author
 Sundanese RoBERTa Base was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/).

 ## Author
 Sundanese RoBERTa Base was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/).
+## Citation Information
+```bib
+@article{rs-907893,
+    author   = {Wongso, Wilson
+                and Lucky, Henry
+                and Suhartono, Derwin},
+    journal  = {Journal of Big Data},
+    year     = {2022},
+    month    = {Feb},
+    day      = {26},
+    abstract = {The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding. Like other low-resource languages, the only alternative is to fine-tune existing multilingual models. In this paper, we pre-trained three monolingual Transformer-based language models on Sundanese data. When evaluated on a downstream text classification task, we found that most of our monolingual models outperformed larger multilingual models despite the smaller overall pre-training data. In the subsequent analyses, our models benefited strongly from the Sundanese pre-training corpus size and do not exhibit socially biased behavior. We released our models for other researchers and practitioners to use.},
+    issn     = {2693-5015},
+    doi      = {10.21203/rs.3.rs-907893/v1},
+    url      = {https://doi.org/10.21203/rs.3.rs-907893/v1}
+}
+```