Updated README with the new model replacement
Browse files
README.md
CHANGED
@@ -12,22 +12,21 @@ widget:
|
|
12 |
## Javanese BERT Small
|
13 |
Javanese BERT Small is a masked language model based on the [BERT model](https://arxiv.org/abs/1810.04805). It was trained on the latest (late December 2020) Javanese Wikipedia articles.
|
14 |
|
15 |
-
The model was
|
16 |
-
are based on a Hugging Face tutorial [notebook](https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb) written by [Sylvain Gugger](https://github.com/sgugger), where Sylvain Gugger fine-tuned a [DistilGPT-2](https://huggingface.co/distilgpt2) on [Wikitext2](https://render.githubusercontent.com/view/ipynb?color_mode=dark&commit=43d63e390e8a82f7ae49aa1a877419343a213cb4&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f68756767696e67666163652f6e6f7465626f6f6b732f343364363365333930653861383266376165343961613161383737343139333433613231336362342f6578616d706c65732f6c616e67756167655f6d6f64656c696e672e6970796e62&nwo=huggingface%2Fnotebooks&path=examples%2Flanguage_modeling.ipynb&repository_id=272452525&repository_type=Repository).
|
17 |
|
18 |
Hugging Face's [Transformers]((https://huggingface.co/transformers)) library was used to train the model -- utilizing the base RoBERTa model and their `Trainer` class. PyTorch was used as the backend framework during training, but the model remains compatible with TensorFlow nonetheless.
|
19 |
|
20 |
## Model
|
21 |
-
| Model | #params
|
22 |
-
|
23 |
-
| `javanese-
|
24 |
|
25 |
## Evaluation Results
|
26 |
The model was trained for 15 epochs and the following is the final result once the training ended.
|
27 |
|
28 |
| train loss | valid loss | perplexity | total time |
|
29 |
|------------|------------|------------|------------|
|
30 |
-
|
|
31 |
|
32 |
## How to Use
|
33 |
### As Masked Language Model
|
|
|
12 |
## Javanese BERT Small
|
13 |
Javanese BERT Small is a masked language model based on the [BERT model](https://arxiv.org/abs/1810.04805). It was trained on the latest (late December 2020) Javanese Wikipedia articles.
|
14 |
|
15 |
+
The model was originally HuggingFace's pretrained [English BERT model](https://huggingface.co/bert-base-uncased) and is later fine-tuned on the Javanese dataset. It achieved a perplexity of 49.43 on the validation dataset (20% of the articles). Many of the techniques used are based on a Hugging Face tutorial [notebook](https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb) written by [Sylvain Gugger](https://github.com/sgugger), and [fine-tuning tutorial notebook](https://github.com/piegu/fastai-projects/blob/master/finetuning-English-GPT2-any-language-Portuguese-HuggingFace-fastaiv2.ipynb) written by [Pierre Guillou](https://huggingface.co/pierreguillou).
|
|
|
16 |
|
17 |
Hugging Face's [Transformers]((https://huggingface.co/transformers)) library was used to train the model -- utilizing the base RoBERTa model and their `Trainer` class. PyTorch was used as the backend framework during training, but the model remains compatible with TensorFlow nonetheless.
|
18 |
|
19 |
## Model
|
20 |
+
| Model | #params | Arch. | Training/Validation data (text) |
|
21 |
+
|-----------------------|----------|----------------|-------------------------------------|
|
22 |
+
| `javanese-bert-small` | 110M | BERT Small | Javanese Wikipedia (319 MB of text) |
|
23 |
|
24 |
## Evaluation Results
|
25 |
The model was trained for 15 epochs and the following is the final result once the training ended.
|
26 |
|
27 |
| train loss | valid loss | perplexity | total time |
|
28 |
|------------|------------|------------|------------|
|
29 |
+
| 3.918 | 3.900 | 49.43 | 5:19:36 |
|
30 |
|
31 |
## How to Use
|
32 |
### As Masked Language Model
|