w11wo
/

javanese-bert-small

@@ -12,22 +12,21 @@ widget:
 ## Javanese BERT Small
 Javanese BERT Small is a masked language model based on the [BERT model](https://arxiv.org/abs/1810.04805). It was trained on the latest (late December 2020) Javanese Wikipedia articles.
-The model was trained from scratch and achieved a perplexity of 93.03 on the validation dataset (20% of the articles). Many of the techniques used
-are based on a Hugging Face tutorial [notebook](https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb) written by [Sylvain Gugger](https://github.com/sgugger), where Sylvain Gugger fine-tuned a [DistilGPT-2](https://huggingface.co/distilgpt2) on [Wikitext2](https://render.githubusercontent.com/view/ipynb?color_mode=dark&commit=43d63e390e8a82f7ae49aa1a877419343a213cb4&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f68756767696e67666163652f6e6f7465626f6f6b732f343364363365333930653861383266376165343961613161383737343139333433613231336362342f6578616d706c65732f6c616e67756167655f6d6f64656c696e672e6970796e62&nwo=huggingface%2Fnotebooks&path=examples%2Flanguage_modeling.ipynb&repository_id=272452525&repository_type=Repository).
 Hugging Face's [Transformers]((https://huggingface.co/transformers)) library was used to train the model -- utilizing the base RoBERTa model and their `Trainer` class. PyTorch was used as the backend framework during training, but the model remains compatible with TensorFlow nonetheless.
 ## Model
-| Model                 | #params | Arch.    | Training/Validation data (text)     |
-|-----------------------|---------|----------|-------------------------------------|
-| `javanese-BERT-small` |   84M   |   BERT   | Javanese Wikipedia (319 MB of text) |
 ## Evaluation Results
 The model was trained for 15 epochs and the following is the final result once the training ended.
 | train loss | valid loss | perplexity | total time |
 |------------|------------|------------|------------|
-|    4.539   |    4.533   |   93.03    |   2:31:33  |
 ## How to Use
 ### As Masked Language Model

 ## Javanese BERT Small
 Javanese BERT Small is a masked language model based on the [BERT model](https://arxiv.org/abs/1810.04805). It was trained on the latest (late December 2020) Javanese Wikipedia articles.
+The model was originally HuggingFace's pretrained [English BERT model](https://huggingface.co/bert-base-uncased) and is later fine-tuned on the Javanese dataset. It achieved a perplexity of 49.43 on the validation dataset (20% of the articles). Many of the techniques used are based on a Hugging Face tutorial [notebook](https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb) written by [Sylvain Gugger](https://github.com/sgugger), and [fine-tuning tutorial notebook](https://github.com/piegu/fastai-projects/blob/master/finetuning-English-GPT2-any-language-Portuguese-HuggingFace-fastaiv2.ipynb) written by [Pierre Guillou](https://huggingface.co/pierreguillou).
 Hugging Face's [Transformers]((https://huggingface.co/transformers)) library was used to train the model -- utilizing the base RoBERTa model and their `Trainer` class. PyTorch was used as the backend framework during training, but the model remains compatible with TensorFlow nonetheless.
 ## Model
+| Model                 | #params  | Arch.          | Training/Validation data (text)     |
+|-----------------------|----------|----------------|-------------------------------------|
+| `javanese-bert-small` |   110M   |   BERT Small   | Javanese Wikipedia (319 MB of text) |
 ## Evaluation Results
 The model was trained for 15 epochs and the following is the final result once the training ended.
 | train loss | valid loss | perplexity | total time |
 |------------|------------|------------|------------|
+|    3.918   |    3.900   |   49.43    |   5:19:36  |
 ## How to Use
 ### As Masked Language Model