w11wo commited on
Commit
bcf712c
1 Parent(s): 98a2449

Updated README with the new model replacement

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -12,22 +12,21 @@ widget:
12
  ## Javanese BERT Small
13
  Javanese BERT Small is a masked language model based on the [BERT model](https://arxiv.org/abs/1810.04805). It was trained on the latest (late December 2020) Javanese Wikipedia articles.
14
 
15
- The model was trained from scratch and achieved a perplexity of 93.03 on the validation dataset (20% of the articles). Many of the techniques used
16
- are based on a Hugging Face tutorial [notebook](https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb) written by [Sylvain Gugger](https://github.com/sgugger), where Sylvain Gugger fine-tuned a [DistilGPT-2](https://huggingface.co/distilgpt2) on [Wikitext2](https://render.githubusercontent.com/view/ipynb?color_mode=dark&commit=43d63e390e8a82f7ae49aa1a877419343a213cb4&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f68756767696e67666163652f6e6f7465626f6f6b732f343364363365333930653861383266376165343961613161383737343139333433613231336362342f6578616d706c65732f6c616e67756167655f6d6f64656c696e672e6970796e62&nwo=huggingface%2Fnotebooks&path=examples%2Flanguage_modeling.ipynb&repository_id=272452525&repository_type=Repository).
17
 
18
  Hugging Face's [Transformers]((https://huggingface.co/transformers)) library was used to train the model -- utilizing the base RoBERTa model and their `Trainer` class. PyTorch was used as the backend framework during training, but the model remains compatible with TensorFlow nonetheless.
19
 
20
  ## Model
21
- | Model | #params | Arch. | Training/Validation data (text) |
22
- |-----------------------|---------|----------|-------------------------------------|
23
- | `javanese-BERT-small` | 84M | BERT | Javanese Wikipedia (319 MB of text) |
24
 
25
  ## Evaluation Results
26
  The model was trained for 15 epochs and the following is the final result once the training ended.
27
 
28
  | train loss | valid loss | perplexity | total time |
29
  |------------|------------|------------|------------|
30
- | 4.539 | 4.533 | 93.03 | 2:31:33 |
31
 
32
  ## How to Use
33
  ### As Masked Language Model
 
12
  ## Javanese BERT Small
13
  Javanese BERT Small is a masked language model based on the [BERT model](https://arxiv.org/abs/1810.04805). It was trained on the latest (late December 2020) Javanese Wikipedia articles.
14
 
15
+ The model was originally HuggingFace's pretrained [English BERT model](https://huggingface.co/bert-base-uncased) and is later fine-tuned on the Javanese dataset. It achieved a perplexity of 49.43 on the validation dataset (20% of the articles). Many of the techniques used are based on a Hugging Face tutorial [notebook](https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb) written by [Sylvain Gugger](https://github.com/sgugger), and [fine-tuning tutorial notebook](https://github.com/piegu/fastai-projects/blob/master/finetuning-English-GPT2-any-language-Portuguese-HuggingFace-fastaiv2.ipynb) written by [Pierre Guillou](https://huggingface.co/pierreguillou).
 
16
 
17
  Hugging Face's [Transformers]((https://huggingface.co/transformers)) library was used to train the model -- utilizing the base RoBERTa model and their `Trainer` class. PyTorch was used as the backend framework during training, but the model remains compatible with TensorFlow nonetheless.
18
 
19
  ## Model
20
+ | Model | #params | Arch. | Training/Validation data (text) |
21
+ |-----------------------|----------|----------------|-------------------------------------|
22
+ | `javanese-bert-small` | 110M | BERT Small | Javanese Wikipedia (319 MB of text) |
23
 
24
  ## Evaluation Results
25
  The model was trained for 15 epochs and the following is the final result once the training ended.
26
 
27
  | train loss | valid loss | perplexity | total time |
28
  |------------|------------|------------|------------|
29
+ | 3.918 | 3.900 | 49.43 | 5:19:36 |
30
 
31
  ## How to Use
32
  ### As Masked Language Model