w11wo commited on
Commit
fe065ad
1 Parent(s): fd2fb55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -73,3 +73,21 @@ Do consider the biases which came from all four datasets that may be carried ove
73
  ## Author
74
 
75
  Sundanese GPT-2 Base was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  ## Author
74
 
75
  Sundanese GPT-2 Base was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/).
76
+
77
+ ## Citation Information
78
+
79
+ ```bib
80
+ @article{rs-907893,
81
+ author = {Wongso, Wilson
82
+ and Lucky, Henry
83
+ and Suhartono, Derwin},
84
+ journal = {Journal of Big Data},
85
+ year = {2022},
86
+ month = {Feb},
87
+ day = {26},
88
+ abstract = {The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding. Like other low-resource languages, the only alternative is to fine-tune existing multilingual models. In this paper, we pre-trained three monolingual Transformer-based language models on Sundanese data. When evaluated on a downstream text classification task, we found that most of our monolingual models outperformed larger multilingual models despite the smaller overall pre-training data. In the subsequent analyses, our models benefited strongly from the Sundanese pre-training corpus size and do not exhibit socially biased behavior. We released our models for other researchers and practitioners to use.},
89
+ issn = {2693-5015},
90
+ doi = {10.21203/rs.3.rs-907893/v1},
91
+ url = {https://doi.org/10.21203/rs.3.rs-907893/v1}
92
+ }
93
+ ```