asier-gutierrez
commited on
Commit
•
ee14a17
1
Parent(s):
e0f9624
Update README.md
Browse files
README.md
CHANGED
@@ -38,7 +38,7 @@ Some of the statistics of the corpus:
|
|
38 |
The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens. The RoBERTa-base-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 48 hours with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
|
39 |
|
40 |
## Evaluation and results
|
41 |
-
For evaluation details visit our [GitHub repository](https://github.com/PlanTL-
|
42 |
|
43 |
## Citing
|
44 |
Check out our paper for all the details: https://arxiv.org/abs/2107.07253
|
|
|
38 |
The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens. The RoBERTa-base-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 48 hours with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
|
39 |
|
40 |
## Evaluation and results
|
41 |
+
For evaluation details visit our [GitHub repository](https://github.com/PlanTL-GOB-ES/lm-spanish).
|
42 |
|
43 |
## Citing
|
44 |
Check out our paper for all the details: https://arxiv.org/abs/2107.07253
|