Updated README.md; added OSCAR link.
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ widget:
|
|
10 |
---
|
11 |
|
12 |
## Malaysian DistilBERT Small
|
13 |
-
Malaysian DistilBERT Small is a masked language model based on the [DistilBERT model](https://arxiv.org/abs/1910.01108). It was trained on the OSCAR dataset, specifically the `unshuffled_original_ms` subset.
|
14 |
|
15 |
The model was originally HuggingFace's pretrained [English DistilBERT model](https://huggingface.co/distilbert-base-uncased) and is later fine-tuned on the Malaysian dataset. It achieved a perplexity of 10.33 on the validation dataset (20% of the dataset). Many of the techniques used are based on a Hugging Face tutorial [notebook](https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb) written by [Sylvain Gugger](https://github.com/sgugger), and [fine-tuning tutorial notebook](https://github.com/piegu/fastai-projects/blob/master/finetuning-English-GPT2-any-language-Portuguese-HuggingFace-fastaiv2.ipynb) written by [Pierre Guillou](https://huggingface.co/pierreguillou).
|
16 |
|
|
|
10 |
---
|
11 |
|
12 |
## Malaysian DistilBERT Small
|
13 |
+
Malaysian DistilBERT Small is a masked language model based on the [DistilBERT model](https://arxiv.org/abs/1910.01108). It was trained on the [OSCAR](https://huggingface.co/datasets/oscar) dataset, specifically the `unshuffled_original_ms` subset.
|
14 |
|
15 |
The model was originally HuggingFace's pretrained [English DistilBERT model](https://huggingface.co/distilbert-base-uncased) and is later fine-tuned on the Malaysian dataset. It achieved a perplexity of 10.33 on the validation dataset (20% of the dataset). Many of the techniques used are based on a Hugging Face tutorial [notebook](https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb) written by [Sylvain Gugger](https://github.com/sgugger), and [fine-tuning tutorial notebook](https://github.com/piegu/fastai-projects/blob/master/finetuning-English-GPT2-any-language-Portuguese-HuggingFace-fastaiv2.ipynb) written by [Pierre Guillou](https://huggingface.co/pierreguillou).
|
16 |
|