Update README.md
Browse files
README.md
CHANGED
@@ -109,8 +109,7 @@ torch.Size([1, 19, 768])
|
|
109 |
|
110 |
You can use the raw model for fill mask or fine-tune it to a downstream task.
|
111 |
|
112 |
-
The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of
|
113 |
-
unfiltered content from the internet, which is far from neutral. Here's an example of how the model can have biased predictions:
|
114 |
|
115 |
```python
|
116 |
>>> from transformers import pipeline, set_seed
|
@@ -181,6 +180,7 @@ Some of the statistics of the corpus:
|
|
181 |
### Training Procedure
|
182 |
The configuration of the **RoBERTa-base-bne** model is as follows:
|
183 |
- RoBERTa-b: 12-layer, 768-hidden, 12-heads, 125M parameters.
|
|
|
184 |
The pretraining objective used for this architecture is masked language modeling without next sentence prediction.
|
185 |
The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens.
|
186 |
The RoBERTa-base-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 48 hours with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
|
|
|
109 |
|
110 |
You can use the raw model for fill mask or fine-tune it to a downstream task.
|
111 |
|
112 |
+
The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. At the time of submission, no measures have been taken to estimate the bias and toxicity embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated. Nevertheless, here's an example of how the model can have biased predictions:
|
|
|
113 |
|
114 |
```python
|
115 |
>>> from transformers import pipeline, set_seed
|
|
|
180 |
### Training Procedure
|
181 |
The configuration of the **RoBERTa-base-bne** model is as follows:
|
182 |
- RoBERTa-b: 12-layer, 768-hidden, 12-heads, 125M parameters.
|
183 |
+
|
184 |
The pretraining objective used for this architecture is masked language modeling without next sentence prediction.
|
185 |
The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens.
|
186 |
The RoBERTa-base-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 48 hours with 16 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
|