w11wo
/

sundanese-gpt2-base-emotion-classifier

@@ -64,18 +64,20 @@ Do consider the biases which come from both the pre-trained RoBERTa model and th
 Sundanese GPT-2 Base Emotion Classifier was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on Google Colaboratory using their free GPU access.
-## Credits
-```
-@inproceedings{Putr2011:Sundanese,
-	title        = {Sundanese Twitter Dataset for Emotion Classification},
-	author       = {Oddy Virgantara Putra and Fathin Muhammad Wasmanson and Triana Harmini and Shoffin Nahwa Utama},
-	year         = 2020,
-	month        = nov,
-	booktitle    = {2020 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM) (CENIM 2020)},
-	address      = virtual,
-	days         = 16,
-	keywords     = {emotion classification; sundanese; machine learning},
-	abstract     = {Sundanese is the second-largest tribe in Indonesia which possesses many dialects. This condition has gained attention for many researchers to analyze emotion especially on social media. However, with barely available Sundanese dataset, this condition makes understanding sundanese emotion is a challenging task. In this research, we proposed a dataset for emotion classification of Sundanese text. The preprocessing includes case folding, stopwords removal, stemming, tokenizing, and text representation. Prior to classification, for the feature generation, we utilize term frequency-inverse document frequency (TFIDF). We evaluated our dataset using k-Fold Cross Validation. Our experiments with the proposed method exhibit an effective result for machine learning classification. Furthermore, as far as we know, this is the first Sundanese emotion dataset available for public.}
 }
-```

 Sundanese GPT-2 Base Emotion Classifier was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on Google Colaboratory using their free GPU access.
+## Citation Information
+```bib
+@article{rs-907893,
+    author   = {Wongso, Wilson
+                and Lucky, Henry
+                and Suhartono, Derwin},
+    journal  = {Journal of Big Data},
+    year     = {2022},
+    month    = {Feb},
+    day      = {26},
+    abstract = {The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding. Like other low-resource languages, the only alternative is to fine-tune existing multilingual models. In this paper, we pre-trained three monolingual Transformer-based language models on Sundanese data. When evaluated on a downstream text classification task, we found that most of our monolingual models outperformed larger multilingual models despite the smaller overall pre-training data. In the subsequent analyses, our models benefited strongly from the Sundanese pre-training corpus size and do not exhibit socially biased behavior. We released our models for other researchers and practitioners to use.},
+    issn     = {2693-5015},
+    doi      = {10.21203/rs.3.rs-907893/v1},
+    url      = {https://doi.org/10.21203/rs.3.rs-907893/v1}
 }
+```