Files changed (1) hide show
  1. README.md +57 -15
README.md CHANGED
@@ -4,6 +4,9 @@ language:
4
  license: apache-2.0
5
  tags:
6
  - generated_from_trainer
 
 
 
7
  datasets:
8
  - glue
9
  metrics:
@@ -72,10 +75,10 @@ should probably proofread and complete it, then remove this comment. -->
72
 
73
  # bert-base-uncased-mrpc
74
 
75
- This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the **GLUE MRPC dataset**.
76
 
77
- It is a pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository.
78
- This model is uncased: it does not make a difference between **"english"** and **"English"**.
79
  BERT base model (uncased)
80
 
81
  It provides:
@@ -107,18 +110,26 @@ The following hyperparameters were used during training:
107
  - Datasets 1.14.0
108
  - Tokenizers 0.11.6
109
 
110
- - # To use:
111
  ```python
112
  from transformers import BertTokenizer, BertModel
113
  tokenizer = BertTokenizer.from_pretrained('Intel/bert-base-uncased-mrpc')
114
  model = BertModel.from_pretrained("Intel/bert-base-uncased-mrpc")
115
- # text = "according to the theory of aerodynamics and wind tunnel experiments the bumble bee is unable to fly. This is bcause the size, weight, and shape of his body in relation to total wingspread makes flying impossible. But, the bumble bee being ignorant of these pround scientific truths goes ahead and flies anyway, and manages to make a little honey everyday."
116
  text = "The inspector analyzed the soundness in the building."
117
  encoded_input = tokenizer(text, return_tensors='pt')
118
  output = model(**encoded_input)
119
  # print BaseModelOutputWithPoolingAndCrossAttentions and pooler_output
120
- # output similar to:
 
 
 
 
 
 
 
121
  ```
 
 
122
  BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.0219, 0.1258, -0.8529, ..., 0.6416, 0.6275, 0.5583],
123
  [ 0.3125, -0.1921, -0.9895, ..., 0.6069, 1.8431, -0.5939],
124
  [ 0.6147, -0.6098, -0.3517, ..., -0.1145, 1.1748, -0.7104],
@@ -130,14 +141,45 @@ BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.0219
130
  -0.9176, -0.9994, 0.2962, 0.2891, -0.3301, 0.8786, 0.9234, -0.7643,
131
  0.2487, -0.5245, -0.0649, -0.6722, 0.8550, 1.0000, -0.7785, 0.5322,
132
  0.6056, 0.4622, 0.2838, 0.5501, 0.6981, 0.2597, -0.7896, -0.1189,
 
133
 
134
- ```python
135
- # Print tokens * ids in of inmput string below
136
- print('Tokenized Text: ', tokenizer.tokenize(text), '\n')
137
- print('Token IDs: ', tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text)))
138
 
139
- #Print tokens in text
140
- encoded_input['input_ids'][0]
141
- tokenizer.convert_ids_to_tokens(encoded_input['input_ids'][0])
142
- ```
143
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  license: apache-2.0
5
  tags:
6
  - generated_from_trainer
7
+ - bert-base-uncased
8
+ - text-classification
9
+ - fp32
10
  datasets:
11
  - glue
12
  metrics:
 
75
 
76
  # bert-base-uncased-mrpc
77
 
78
+ This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the **GLUE MRPC dataset**. The GLUE MRPC dataset, from The [Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005)](https://www.tensorflow.org/datasets/catalog/glue) is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent.
79
 
80
+ It is a pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
81
+ This model, bert-base-uncased-mrpc, is uncased: it does not make a difference between **"english"** and **"English"**. Masked language modeling predicts a masked token in a sequence, and the model can attend to tokens bidirectionally. This means the model has full access to the tokens on the left and right. Masked language modeling is great for tasks that require a good contextual understanding of an entire sequence. BERT is an example of a masked language model. For this model, you don’t need labels (also known as an unsupervised task) because the next word (MLM) is the label
82
  BERT base model (uncased)
83
 
84
  It provides:
 
110
  - Datasets 1.14.0
111
  - Tokenizers 0.11.6
112
 
113
+ # To use:
114
  ```python
115
  from transformers import BertTokenizer, BertModel
116
  tokenizer = BertTokenizer.from_pretrained('Intel/bert-base-uncased-mrpc')
117
  model = BertModel.from_pretrained("Intel/bert-base-uncased-mrpc")
 
118
  text = "The inspector analyzed the soundness in the building."
119
  encoded_input = tokenizer(text, return_tensors='pt')
120
  output = model(**encoded_input)
121
  # print BaseModelOutputWithPoolingAndCrossAttentions and pooler_output
122
+
123
+ # Print tokens * ids in of inmput string below
124
+ print('Tokenized Text: ', tokenizer.tokenize(text), '\n')
125
+ print('Token IDs: ', tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text)))
126
+
127
+ #Print tokens in text
128
+ encoded_input['input_ids'][0]
129
+ tokenizer.convert_ids_to_tokens(encoded_input['input_ids'][0])
130
  ```
131
+ # Output similar to:
132
+ ```python
133
  BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[ 0.0219, 0.1258, -0.8529, ..., 0.6416, 0.6275, 0.5583],
134
  [ 0.3125, -0.1921, -0.9895, ..., 0.6069, 1.8431, -0.5939],
135
  [ 0.6147, -0.6098, -0.3517, ..., -0.1145, 1.1748, -0.7104],
 
141
  -0.9176, -0.9994, 0.2962, 0.2891, -0.3301, 0.8786, 0.9234, -0.7643,
142
  0.2487, -0.5245, -0.0649, -0.6722, 0.8550, 1.0000, -0.7785, 0.5322,
143
  0.6056, 0.4622, 0.2838, 0.5501, 0.6981, 0.2597, -0.7896, -0.1189,
144
+ ```
145
 
146
+ # Related work on QuantizationAwareTraining
147
+ An Int8 Quantized version of this model can be found [link](https://huggingface.co/Intel/bert-base-uncased-mrpc-int8-qat-inc)
 
 
148
 
149
+ This is an INT8 PyTorch model quantized with huggingface/optimum-intel through the usage of Intel® Neural Compressor.
150
+
151
+ # Ethical Considerations and Limitations
152
+ bert-base-uncased-mrpc can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
153
+
154
+ Therefore, before deploying any applications of bert-base-uncased-mrpc, developers should perform safety testing.
155
+
156
+ # Caveats and Recommendations
157
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
158
+
159
+ Here are a couple of useful links to learn more about Intel's AI software:
160
+
161
+ - Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
162
+ - Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
163
+
164
+ # Disclaimer
165
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
166
+
167
+ # BibTeX entry and citation info
168
+ ```bibtex
169
+ @article{DBLP:journals/corr/abs-1810-04805,
170
+ author = {Jacob Devlin and
171
+ Ming{-}Wei Chang and
172
+ Kenton Lee and
173
+ Kristina Toutanova},
174
+ title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
175
+ Understanding},
176
+ journal = {CoRR},
177
+ volume = {abs/1810.04805},
178
+ year = {2018},
179
+ url = {http://arxiv.org/abs/1810.04805},
180
+ archivePrefix = {arXiv},
181
+ eprint = {1810.04805},
182
+ timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
183
+ biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
184
+ bibsource = {dblp computer science bibliography, https://dblp.org}
185
+ }