Update README.md
Browse files
README.md
CHANGED
@@ -19,5 +19,18 @@ Paper: [idT5: Indonesian Version of Multilingual T5 Transformer](https://arxiv.o
|
|
19 |
|
20 |
Authors: *Mukhlish Fuadi, Adhi Dharma Wibawa, Surya Sumpeno*
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
## Abstract
|
23 |
Indonesian language is spoken by almost 200 million people and is the 10th most spoken language in the world, but it is under-represented in NLP (Natural Language Processing) research. A sparsity of language resources has hampered previous work on Indonesian. The Transformer is a new architecture rapidly becoming dominant for NLP, surpassing alternatives like convolutional and recurrent neural networks. T5 (Text-to-Text Transfer Transformer) is a Transformer model that converts all text-based language problems to text-to-text format for English. The multilingual variant is mT5 (multilingual T5) which has shown promising results on many NLP tasks across languages. However, the size of this multilingual model is a drawback for its application in real production applications, which sometimes require only one language. In this study, the mT5 model was adapted for only one language, Indonesian, resulting in a pre-trained T5 model that was specific only for Indonesian with a smaller size. For performance comparison, we fine-tuned this model and the mT5 model to the Sentiment Analysis (SA), Question Generation (QG), and Question Answering (QA) tasks with the exact mechanism and dataset. Fine-tuned model based on our model achieved 77.18% accuracy on SA, 8% higher than the mT5-based model, and obtained nearly the same score as the mT5-based model on QG and QA. The results confirm that it is possible to produce a smaller pre-trained model that maintains comparable yields while reducing the model size by up to 58%. In addition, the resulting model requires less memory, loads faster, and inference times faster.
|
|
|
19 |
|
20 |
Authors: *Mukhlish Fuadi, Adhi Dharma Wibawa, Surya Sumpeno*
|
21 |
|
22 |
+
## Citation
|
23 |
+
```
|
24 |
+
@misc{https://doi.org/10.48550/arxiv.2302.00856,
|
25 |
+
doi = {10.48550/ARXIV.2302.00856},
|
26 |
+
url = {https://arxiv.org/abs/2302.00856},
|
27 |
+
author = {Fuadi, Mukhlish and Wibawa, Adhi Dharma and Sumpeno, Surya},
|
28 |
+
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7},
|
29 |
+
title = {idT5: Indonesian Version of Multilingual T5 Transformer},
|
30 |
+
publisher = {arXiv},
|
31 |
+
year = {2023}
|
32 |
+
}
|
33 |
+
```
|
34 |
+
|
35 |
## Abstract
|
36 |
Indonesian language is spoken by almost 200 million people and is the 10th most spoken language in the world, but it is under-represented in NLP (Natural Language Processing) research. A sparsity of language resources has hampered previous work on Indonesian. The Transformer is a new architecture rapidly becoming dominant for NLP, surpassing alternatives like convolutional and recurrent neural networks. T5 (Text-to-Text Transfer Transformer) is a Transformer model that converts all text-based language problems to text-to-text format for English. The multilingual variant is mT5 (multilingual T5) which has shown promising results on many NLP tasks across languages. However, the size of this multilingual model is a drawback for its application in real production applications, which sometimes require only one language. In this study, the mT5 model was adapted for only one language, Indonesian, resulting in a pre-trained T5 model that was specific only for Indonesian with a smaller size. For performance comparison, we fine-tuned this model and the mT5 model to the Sentiment Analysis (SA), Question Generation (QG), and Question Answering (QA) tasks with the exact mechanism and dataset. Fine-tuned model based on our model achieved 77.18% accuracy on SA, 8% higher than the mT5-based model, and obtained nearly the same score as the mT5-based model on QG and QA. The results confirm that it is possible to produce a smaller pre-trained model that maintains comparable yields while reducing the model size by up to 58%. In addition, the resulting model requires less memory, loads faster, and inference times faster.
|