Commit
•
9a94f2f
1
Parent(s):
ade8c81
Update README.md
Browse files
README.md
CHANGED
@@ -43,10 +43,8 @@ by Meta AI and pretrained on the [CC100 multilingual dataset](https://huggingfac
|
|
43 |
It was then fine-tuned on the [XNLI dataset](https://huggingface.co/datasets/xnli), which contains hypothesis-premise pairs from 15 languages,
|
44 |
as well as the English [MNLI dataset](https://huggingface.co/datasets/multi_nli).
|
45 |
XLM-V-base was publish on 23.01.2023 in [this paper](https://arxiv.org/pdf/2301.10472.pdf).
|
46 |
-
Its main innovation is a larger vocabulary: previous multilingual models had a vocabulary of 250 000 tokens,
|
47 |
-
while XLM-V
|
48 |
-
|
49 |
-
[mDeBERTa-v3](https://arxiv.org/pdf/2111.09543.pdf).
|
50 |
|
51 |
|
52 |
### How to use the model
|
@@ -89,7 +87,7 @@ Note that the XNLI contains a training set of 15 machine translated versions of
|
|
89 |
but due to quality issues with these machine translations, this model was only trained on the professional translations
|
90 |
from the XNLI development set and the original English MNLI training set (392 702 texts).
|
91 |
Not using machine translated texts can avoid overfitting the model to the 15 languages;
|
92 |
-
avoids catastrophic forgetting of the other
|
93 |
and significantly reduces training costs.
|
94 |
|
95 |
### Training procedure
|
@@ -104,6 +102,7 @@ training_args = TrainingArguments(
|
|
104 |
weight_decay=0.01, # strength of weight decay
|
105 |
)
|
106 |
```
|
|
|
107 |
### Eval results
|
108 |
The model was evaluated on the XNLI test set on 15 languages (5010 texts per language, 75150 in total).
|
109 |
Note that multilingual NLI models are capable of classifying NLI texts without receiving NLI training data
|
@@ -111,25 +110,36 @@ in the specific language (cross-lingual transfer). This means that the model is
|
|
111 |
the other 101~ languages XLM-V was training on, but performance is most likely lower than for those languages available in XNLI.
|
112 |
|
113 |
Also note that if other multilingual models on the model hub claim performance of around 90% on languages other than English,
|
114 |
-
the authors have most likely made a mistake during testing since non of the latest papers shows a multilingual average performance
|
115 |
of more than a few points above 80% on XNLI (see [here](https://arxiv.org/pdf/2111.09543.pdf) or [here](https://arxiv.org/pdf/1911.02116.pdf)).
|
116 |
|
117 |
-
average
|
118 |
-
|
119 |
-
|
|
|
|
|
|
|
|
|
|
|
120 |
|
|
|
|
|
|
|
|
|
121 |
|
122 |
-
|Datasets|mnli_m|mnli_mm|
|
123 |
-
| :---: | :---: | :---: |
|
124 |
-
|Accuracy|0.852|0.854|
|
125 |
-
|Speed (text/sec)|2098.0|2170.0|
|
126 |
|
127 |
|
128 |
## Limitations and bias
|
129 |
Please consult the original XLM-V paper and literature on different NLI datasets for potential biases.
|
130 |
|
131 |
## Citation
|
132 |
-
If you use this model, please cite: Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas, and Kasper Welbers. 2022.
|
|
|
|
|
133 |
|
134 |
## Ideas for cooperation or questions?
|
135 |
If you have questions or ideas for cooperation, contact me at m{dot}laurer{at}vu{dot}nl or [LinkedIn](https://www.linkedin.com/in/moritz-laurer/)
|
|
|
43 |
It was then fine-tuned on the [XNLI dataset](https://huggingface.co/datasets/xnli), which contains hypothesis-premise pairs from 15 languages,
|
44 |
as well as the English [MNLI dataset](https://huggingface.co/datasets/multi_nli).
|
45 |
XLM-V-base was publish on 23.01.2023 in [this paper](https://arxiv.org/pdf/2301.10472.pdf).
|
46 |
+
Its main innovation is a larger and better vocabulary: previous multilingual models had a vocabulary of 250 000 tokens,
|
47 |
+
while XLM-V 'knows' 1 million tokens. The improved vocabulary allows for better representations of more languages.
|
|
|
|
|
48 |
|
49 |
|
50 |
### How to use the model
|
|
|
87 |
but due to quality issues with these machine translations, this model was only trained on the professional translations
|
88 |
from the XNLI development set and the original English MNLI training set (392 702 texts).
|
89 |
Not using machine translated texts can avoid overfitting the model to the 15 languages;
|
90 |
+
avoids catastrophic forgetting of the other 101~ languages XLM-V was pre-trained on;
|
91 |
and significantly reduces training costs.
|
92 |
|
93 |
### Training procedure
|
|
|
102 |
weight_decay=0.01, # strength of weight decay
|
103 |
)
|
104 |
```
|
105 |
+
|
106 |
### Eval results
|
107 |
The model was evaluated on the XNLI test set on 15 languages (5010 texts per language, 75150 in total).
|
108 |
Note that multilingual NLI models are capable of classifying NLI texts without receiving NLI training data
|
|
|
110 |
the other 101~ languages XLM-V was training on, but performance is most likely lower than for those languages available in XNLI.
|
111 |
|
112 |
Also note that if other multilingual models on the model hub claim performance of around 90% on languages other than English,
|
113 |
+
the authors have most likely made a mistake during testing since non of the latest papers (of mostly larger models) shows a multilingual average performance
|
114 |
of more than a few points above 80% on XNLI (see [here](https://arxiv.org/pdf/2111.09543.pdf) or [here](https://arxiv.org/pdf/1911.02116.pdf)).
|
115 |
|
116 |
+
The average XNLI performance of XLM-V reported in the paper is 0.76 ([see table 2](https://arxiv.org/pdf/2301.10472.pdf)).
|
117 |
+
This reimplementation has an average performance of 0.78.
|
118 |
+
This increase in performance is probably thanks to the addition of MNLI in the training data.
|
119 |
+
Note that [mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) has an average
|
120 |
+
performance of 0.808 and is smaller (3GB for XLM-V vs. 560MB for mDeBERTa) and is faster (thanks to mDeBERTa's smaller vocabulary).
|
121 |
+
This difference comes probably from mDeBERTa-v3's improved pre-training objective.
|
122 |
+
Depending on the task, it is probably better to use [mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli),
|
123 |
+
but XLM-V could be better on some languages based on its improved vocabulary.
|
124 |
|
125 |
+
|Datasets|ar|bg|de|el|en|es|fr|hi|ru|sw|th|tr|ur|vi|zh|average
|
126 |
+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |:---: |
|
127 |
+
|Accuracy|0.757|0.808|0.796|0.79|0.856|0.814|0.806|0.751|0.782|0.725|0.757|0.766|0.729|0.784|0.782|0.780|
|
128 |
+
|Speed (text/sec)|3501.0|3324.0|3438.0|3174.0|3713.0|3500.0|3129.0|3042.0|3419.0|3468.0|3782.0|3772.0|3099.0|3117.0|4217.0|na|
|
129 |
|
130 |
+
|Datasets|mnli_m|mnli_mm|
|
131 |
+
| :---: | :---: | :---: |
|
132 |
+
|Accuracy|0.852|0.854|
|
133 |
+
|Speed (text/sec)|2098.0|2170.0|
|
134 |
|
135 |
|
136 |
## Limitations and bias
|
137 |
Please consult the original XLM-V paper and literature on different NLI datasets for potential biases.
|
138 |
|
139 |
## Citation
|
140 |
+
If you use this model, please cite: Laurer, Moritz, Wouter van Atteveldt, Andreu Salleras Casas, and Kasper Welbers. 2022.
|
141 |
+
‘Less Annotating, More Classifying – Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT - NLI’.
|
142 |
+
Preprint, June. Open Science Framework. https://osf.io/74b8k.
|
143 |
|
144 |
## Ideas for cooperation or questions?
|
145 |
If you have questions or ideas for cooperation, contact me at m{dot}laurer{at}vu{dot}nl or [LinkedIn](https://www.linkedin.com/in/moritz-laurer/)
|