Update README.md
Browse files
README.md
CHANGED
@@ -33,6 +33,12 @@ model = AutoModel.from_pretrained("GroNLP/bert-base-dutch-cased") # PyTorch
|
|
33 |
model = TFAutoModel.from_pretrained("GroNLP/bert-base-dutch-cased") # Tensorflow
|
34 |
```
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
## Benchmarks
|
37 |
|
38 |
The arXiv paper lists benchmarks. Here are a couple of comparisons between BERTje, multilingual BERT, BERT-NL and RobBERT that were done after writing the paper. Unlike some other comparisons, the fine-tuning procedures for these benchmarks are identical for each pre-trained model. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures.
|
@@ -69,12 +75,12 @@ Headers in the tables below link to original data sources. Scores link to the mo
|
|
69 |
|
70 |
```bibtex
|
71 |
@misc{devries2019bertje,
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
}
|
80 |
```
|
|
|
33 |
model = TFAutoModel.from_pretrained("GroNLP/bert-base-dutch-cased") # Tensorflow
|
34 |
```
|
35 |
|
36 |
+
**WARNING:** The vocabulary size of BERTje has changed in 2021. If you use an older fine-tuned model and experience problems with the `GroNLP/bert-base-dutch-cased` tokenizer, use use the following tokenizer:
|
37 |
+
|
38 |
+
```python
|
39 |
+
tokenizer = AutoTokenizer.from_pretrained("GroNLP/bert-base-dutch-cased", revision="v1") # v1 is the old vocabulary
|
40 |
+
```
|
41 |
+
|
42 |
## Benchmarks
|
43 |
|
44 |
The arXiv paper lists benchmarks. Here are a couple of comparisons between BERTje, multilingual BERT, BERT-NL and RobBERT that were done after writing the paper. Unlike some other comparisons, the fine-tuning procedures for these benchmarks are identical for each pre-trained model. You may be able to achieve higher scores for individual models by optimizing fine-tuning procedures.
|
|
|
75 |
|
76 |
```bibtex
|
77 |
@misc{devries2019bertje,
|
78 |
+
\ttitle = {{BERTje}: {A} {Dutch} {BERT} {Model}},
|
79 |
+
\tshorttitle = {{BERTje}},
|
80 |
+
\tauthor = {de Vries, Wietse and van Cranenburgh, Andreas and Bisazza, Arianna and Caselli, Tommaso and Noord, Gertjan van and Nissim, Malvina},
|
81 |
+
\tyear = {2019},
|
82 |
+
\tmonth = dec,
|
83 |
+
\thowpublished = {arXiv:1912.09582},
|
84 |
+
\turl = {http://arxiv.org/abs/1912.09582},
|
85 |
}
|
86 |
```
|