|
--- |
|
language: da |
|
widget: |
|
- text: En trend, der kan blive ligeså hot som<mask>. |
|
tags: |
|
- roberta |
|
- danish |
|
- masked-lm |
|
- pytorch |
|
license: cc-by-4.0 |
|
--- |
|
|
|
# DanskBERT |
|
|
|
This is DanskBERT, a Danish language model. Note that you should not prepend the mask with a space when using it directly! |
|
|
|
The model is the best performing base-size model on the [ScandEval benchmark for Danish](https://scandeval.github.io/nlu-benchmark/). |
|
|
|
DanskBERT was trained on the Danish Gigaword Corpus (Strømberg-Derczynski et al., 2021). |
|
|
|
DanskBERT was trained using fairseq using the RoBERTa-base configuration. The model was trained with a batch size of 2k, and was trained to convergence for 500k steps using 16 V100 cards for approximately two weeks. |
|
|
|
If you find this model useful, please cite |
|
|
|
``` |
|
@inproceedings{snaebjarnarson-etal-2023-transfer, |
|
title = "{T}ransfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese", |
|
author = "Snæbjarnarson, Vésteinn and |
|
Simonsen, Annika and |
|
Glavaš, Goran and |
|
Vulić, Ivan", |
|
booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)", |
|
month = "may 22--24", |
|
year = "2023", |
|
address = "Tórshavn, Faroe Islands", |
|
publisher = {Link{\"o}ping University Electronic Press, Sweden}, |
|
} |
|
``` |