File size: 1,756 Bytes
cb184e5 48b661a cb184e5 48b661a 23b69ac cb184e5 7e0f4b8 cb184e5 7e0f4b8 4b43558 7e0f4b8 23b69ac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
language: is
widget:
- text: Má bjóða þér <mask> í kvöld?
- text: Forseti <mask> er ágæt.
- text: Súpan var <mask> á bragðið.
tags:
- roberta
- icelandic
- masked-lm
- pytorch
license: cc-by-4.0
---
# IceBERT-igc
This model was trained with fairseq using the RoBERTa-base architecture. It is one of many models we have trained for Icelandic, see the paper referenced below for further details. The training data used is shown in the table below.
| Dataset | Size | Tokens |
|------------------------------------------------------|---------|--------|
| Icelandic Gigaword Corpus v20.05 (IGC) | 8.2 GB | 1,388M |
## Citation
The model is described in this paper [https://arxiv.org/abs/2201.05601](https://arxiv.org/abs/2201.05601). Please cite the paper if you make use of the model.
```
@article{DBLP:journals/corr/abs-2201-05601,
author = {V{\'{e}}steinn Sn{\ae}bjarnarson and
Haukur Barri S{\'{\i}}monarson and
P{\'{e}}tur Orri Ragnarsson and
Svanhv{\'{\i}}t Lilja Ing{\'{o}}lfsd{\'{o}}ttir and
Haukur P{\'{a}}ll J{\'{o}}nsson and
Vilhj{\'{a}}lmur {\TH}orsteinsson and
Hafsteinn Einarsson},
title = {A Warm Start and a Clean Crawled Corpus - {A} Recipe for Good Language
Models},
journal = {CoRR},
volume = {abs/2201.05601},
year = {2022},
url = {https://arxiv.org/abs/2201.05601},
eprinttype = {arXiv},
eprint = {2201.05601},
timestamp = {Thu, 20 Jan 2022 14:21:35 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2201-05601.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
``` |