MoritzLaurer HF staff commited on
Commit
3b4010c
1 Parent(s): fb2c3fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -48,12 +48,14 @@ as well as the English [MNLI dataset](https://huggingface.co/datasets/multi_nli)
48
  The main advantage of distilled models is that they are smaller (faster inference, lower memory requirements) than their teachers (XLM-RoBERTa-large).
49
  The disadvantage is that they lose some of the performance of their larger teachers.
50
 
 
 
51
 
52
  ### How to use the model
53
  #### Simple zero-shot classification pipeline
54
  ```python
55
  from transformers import pipeline
56
- classifier = pipeline("zero-shot-classification", model="MoritzLaurer/xlm-v-base-mnli-xnli")
57
 
58
  sequence_to_classify = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU"
59
  candidate_labels = ["politics", "economy", "entertainment", "environment"]
@@ -66,7 +68,7 @@ from transformers import AutoTokenizer, AutoModelForSequenceClassification
66
  import torch
67
  device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
68
 
69
- model_name = "MoritzLaurer/xlm-v-base-mnli-xnli"
70
  tokenizer = AutoTokenizer.from_pretrained(model_name)
71
  model = AutoModelForSequenceClassification.from_pretrained(model_name)
72
 
@@ -93,7 +95,8 @@ avoids catastrophic forgetting of the other languages it was pre-trained on;
93
  and significantly reduces training costs.
94
 
95
  ### Training procedure
96
- The model was trained using the Hugging Face trainer with the following hyperparameters.
 
97
  ```
98
  training_args = TrainingArguments(
99
  num_train_epochs=3, # total number of training epochs
@@ -119,13 +122,13 @@ XLM-RoBERTa-large instead of -base (multilingual-MiniLM-L6-v2).
119
  |Datasets|avg_xnli|ar|bg|de|el|en|es|fr|hi|ru|sw|th|tr|ur|vi|zh|
120
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
121
  |Accuracy|0.713|0.687|0.742|0.719|0.723|0.789|0.748|0.741|0.691|0.714|0.642|0.699|0.696|0.664|0.723|0.721|
122
- |Speed (text/sec)|6093.0|6210.0|6003.0|6053.0|5409.0|6531.0|6205.0|5615.0|5734.0|5970.0|6219.0|6289.0|6533.0|5851.0|5970.0|6798.0|
123
 
124
 
125
  |Datasets|mnli_m|mnli_mm|
126
  | :---: | :---: | :---: |
127
  |Accuracy|0.782|0.8|
128
- |Speed (text/sec)|4430.0|4395.0|
129
 
130
 
131
 
 
48
  The main advantage of distilled models is that they are smaller (faster inference, lower memory requirements) than their teachers (XLM-RoBERTa-large).
49
  The disadvantage is that they lose some of the performance of their larger teachers.
50
 
51
+ For highest inference speed, I recommend using this 6-layer model. For higher performance I recommend
52
+ [mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) (as of 14.02.2023).
53
 
54
  ### How to use the model
55
  #### Simple zero-shot classification pipeline
56
  ```python
57
  from transformers import pipeline
58
+ classifier = pipeline("zero-shot-classification", model="MoritzLaurer/multilingual-MiniLMv2-L6-mnli-xnli")
59
 
60
  sequence_to_classify = "Angela Merkel ist eine Politikerin in Deutschland und Vorsitzende der CDU"
61
  candidate_labels = ["politics", "economy", "entertainment", "environment"]
 
68
  import torch
69
  device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
70
 
71
+ model_name = "MoritzLaurer/multilingual-MiniLMv2-L6-mnli-xnli"
72
  tokenizer = AutoTokenizer.from_pretrained(model_name)
73
  model = AutoModelForSequenceClassification.from_pretrained(model_name)
74
 
 
95
  and significantly reduces training costs.
96
 
97
  ### Training procedure
98
+ The model was trained using the Hugging Face trainer with the following hyperparameters.
99
+ The exact underlying model is [mMiniLMv2-L6-H384-distilled-from-XLMR-Large](https://huggingface.co/nreimers/mMiniLMv2-L6-H384-distilled-from-XLMR-Large).
100
  ```
101
  training_args = TrainingArguments(
102
  num_train_epochs=3, # total number of training epochs
 
122
  |Datasets|avg_xnli|ar|bg|de|el|en|es|fr|hi|ru|sw|th|tr|ur|vi|zh|
123
  | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
124
  |Accuracy|0.713|0.687|0.742|0.719|0.723|0.789|0.748|0.741|0.691|0.714|0.642|0.699|0.696|0.664|0.723|0.721|
125
+ |Speed text/sec (A100 GPU, eval_batch=120)|6093.0|6210.0|6003.0|6053.0|5409.0|6531.0|6205.0|5615.0|5734.0|5970.0|6219.0|6289.0|6533.0|5851.0|5970.0|6798.0|
126
 
127
 
128
  |Datasets|mnli_m|mnli_mm|
129
  | :---: | :---: | :---: |
130
  |Accuracy|0.782|0.8|
131
+ |Speed text/sec (A100 GPU, eval_batch=120)|4430.0|4395.0|
132
 
133
 
134