Update README.md
Browse files
README.md
CHANGED
@@ -31,9 +31,10 @@ license: cc-by-nc-4.0
|
|
31 |
4. [Training](#training)
|
32 |
5. [Evaluation](#evaluation)
|
33 |
6. [Environmental Impact](#environmental-impact)
|
34 |
-
7. [
|
35 |
-
8. [
|
36 |
-
9. [
|
|
|
37 |
|
38 |
|
39 |
# Model Details
|
@@ -77,6 +78,8 @@ Users (both direct and downstream) should be made aware of the risks, biases and
|
|
77 |
# Training
|
78 |
|
79 |
This model is the XLM model trained on text in 17 languages. The preprocessing included tokenization and byte-pair-encoding. See the [GitHub repo](https://github.com/facebookresearch/XLM#the-17-and-100-languages) and the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for further details on the training data and training procedure.
|
|
|
|
|
80 |
|
81 |
# Evaluation
|
82 |
|
@@ -104,6 +107,10 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
104 |
- **Compute Region:** More information needed
|
105 |
- **Carbon Emitted:** More information needed
|
106 |
|
|
|
|
|
|
|
|
|
107 |
# Citation
|
108 |
|
109 |
**BibTeX:**
|
|
|
31 |
4. [Training](#training)
|
32 |
5. [Evaluation](#evaluation)
|
33 |
6. [Environmental Impact](#environmental-impact)
|
34 |
+
7. [Technical Specifications](#technical-specifications)
|
35 |
+
8. [Citation](#citation)
|
36 |
+
9. [Model Card Authors](#model-card-authors)
|
37 |
+
10. [How To Get Started With the Model](#how-to-get-started-with-the-model)
|
38 |
|
39 |
|
40 |
# Model Details
|
|
|
78 |
# Training
|
79 |
|
80 |
This model is the XLM model trained on text in 17 languages. The preprocessing included tokenization and byte-pair-encoding. See the [GitHub repo](https://github.com/facebookresearch/XLM#the-17-and-100-languages) and the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for further details on the training data and training procedure.
|
81 |
+
|
82 |
+
[Conneau et al. (2020)](https://arxiv.org/pdf/1911.02116.pdf) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).
|
83 |
|
84 |
# Evaluation
|
85 |
|
|
|
107 |
- **Compute Region:** More information needed
|
108 |
- **Carbon Emitted:** More information needed
|
109 |
|
110 |
+
# Technical Specifications
|
111 |
+
|
112 |
+
[Conneau et al. (2020)](https://arxiv.org/pdf/1911.02116.pdf) report that this model has 16 layers, 1280 hidden states, 16 attention heads, and the dimension of the feed-forward layer is 1520. The vocabulary size is 200k and the total number of parameters is 570M (see Table 7).
|
113 |
+
|
114 |
# Citation
|
115 |
|
116 |
**BibTeX:**
|