Update README.md
Browse files
README.md
CHANGED
@@ -155,7 +155,9 @@ Users (both direct and downstream) should be made aware of the risks, biases and
|
|
155 |
|
156 |
# Training
|
157 |
|
158 |
-
This model is the XLM model trained on Wikipedia text in 100 languages. The preprocessing included tokenization
|
|
|
|
|
159 |
|
160 |
# Evaluation
|
161 |
|
@@ -183,6 +185,10 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
183 |
- **Compute Region:** More information needed
|
184 |
- **Carbon Emitted:** More information needed
|
185 |
|
|
|
|
|
|
|
|
|
186 |
# Citation
|
187 |
|
188 |
**BibTeX:**
|
|
|
155 |
|
156 |
# Training
|
157 |
|
158 |
+
This model is the XLM model trained on Wikipedia text in 100 languages. The preprocessing included tokenization with byte-pair-encoding. See the [GitHub repo](https://github.com/facebookresearch/XLM#the-17-and-100-languages) and the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for further details on the training data and training procedure.
|
159 |
+
|
160 |
+
[Conneau et al. (2020)](https://arxiv.org/pdf/1911.02116.pdf) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).
|
161 |
|
162 |
# Evaluation
|
163 |
|
|
|
185 |
- **Compute Region:** More information needed
|
186 |
- **Carbon Emitted:** More information needed
|
187 |
|
188 |
+
# Technical Specifications
|
189 |
+
|
190 |
+
[Conneau et al. (2020)](https://arxiv.org/pdf/1911.02116.pdf) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).
|
191 |
+
|
192 |
# Citation
|
193 |
|
194 |
**BibTeX:**
|