GroNLP
/

bert-base-dutch-cased

Inference Endpoints

Model card Files Files and versions Community

wietsedv commited on Aug 25, 2021

Commit

484ff5c

•

1 Parent(s): e83cd7a

Update README.md

Files changed (1) hide show

README.md +13 -7

README.md CHANGED Viewed

@@ -33,6 +33,12 @@ model = AutoModel.from_pretrained("GroNLP/bert-base-dutch-cased")  # PyTorch
 model = TFAutoModel.from_pretrained("GroNLP/bert-base-dutch-cased")  # Tensorflow
 ```
 ## Benchmarks
 The arXiv paper lists benchmarks. Here are a couple of comparisons between BERTje, multilingual BERT, BERT-NL and RobBERT that were done after writing the paper. Unlike some other comparisons, the fine-tuning procedures for these benchmarks are identical for each pre-trained model. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures.
@@ -69,12 +75,12 @@ Headers in the tables below link to original data sources. Scores link to the mo
 ```bibtex
 @misc{devries2019bertje,
-	title = {{BERTje}: {A} {Dutch} {BERT} {Model}},
-	shorttitle = {{BERTje}},
-	author = {de Vries, Wietse  and  van Cranenburgh, Andreas  and  Bisazza, Arianna  and  Caselli, Tommaso  and  Noord, Gertjan van  and  Nissim, Malvina},
-	year = {2019},
-	month = dec,
-	howpublished = {arXiv:1912.09582},
-	url = {http://arxiv.org/abs/1912.09582},
 }
 ```

 model = TFAutoModel.from_pretrained("GroNLP/bert-base-dutch-cased")  # Tensorflow
 ```
+**WARNING:** The vocabulary size of BERTje has changed in 2021. If you use an older fine-tuned model and experience problems with the `GroNLP/bert-base-dutch-cased` tokenizer, use use the following tokenizer:
+```python
+tokenizer = AutoTokenizer.from_pretrained("GroNLP/bert-base-dutch-cased", revision="v1")  # v1 is the old vocabulary
+```
 ## Benchmarks
 The arXiv paper lists benchmarks. Here are a couple of comparisons between BERTje, multilingual BERT, BERT-NL and RobBERT that were done after writing the paper. Unlike some other comparisons, the fine-tuning procedures for these benchmarks are identical for each pre-trained model. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures.
 ```bibtex
 @misc{devries2019bertje,
+\ttitle = {{BERTje}: {A} {Dutch} {BERT} {Model}},
+\tshorttitle = {{BERTje}},
+\tauthor = {de Vries, Wietse  and  van Cranenburgh, Andreas  and  Bisazza, Arianna  and  Caselli, Tommaso  and  Noord, Gertjan van  and  Nissim, Malvina},
+\tyear = {2019},
+\tmonth = dec,
+\thowpublished = {arXiv:1912.09582},
+\turl = {http://arxiv.org/abs/1912.09582},
 }
 ```