google-bert
/

bert-large-cased-whole-word-masking

@@ -10,8 +10,7 @@ datasets:
 Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
 [this paper](https://arxiv.org/abs/1810.04805) and first released in
-[this repository](https://github.com/google-research/bert). This model is cased: it does not make a difference
-between english and English.
 Differently to other BERT models, this model was trained with a new technique: Whole Word Masking. In this case, all of the tokens corresponding to a word are masked at once. The overall masking rate remains the same.
@@ -59,32 +58,36 @@ You can use this model directly with a pipeline for masked language modeling:
 >>> unmasker = pipeline('fill-mask', model='bert-large-cased-whole-word-masking')
 >>> unmasker("Hello I'm a [MASK] model.")
 [
-    {
-        'sequence': "[CLS] hello i'm a fashion model. [SEP]",
-        'score': 0.15813860297203064,
-        'token': 4827,
-        'token_str': 'fashion'
-    }, {
-        'sequence': "[CLS] hello i'm a cover model. [SEP]",
-        'score': 0.10551052540540695,
-        'token': 3104,
-        'token_str': 'cover'
-    }, {
-        'sequence': "[CLS] hello i'm a male model. [SEP]",
-        'score': 0.08340442180633545,
-        'token': 3287,
-        'token_str': 'male'
-    }, {
-        'sequence': "[CLS] hello i'm a super model. [SEP]",
-        'score': 0.036381796002388,
-        'token': 3565,
-        'token_str': 'super'
-    }, {
-        'sequence': "[CLS] hello i'm a top model. [SEP]",
-        'score': 0.03609578311443329,
-        'token': 2327,
-        'token_str': 'top'
-    }
 ]
 ```
@@ -121,68 +124,69 @@ predictions:
 >>> unmasker("The man worked as a [MASK].")
 [
    {
-      "sequence":"[CLS] the man worked as a waiter. [SEP]",
-      "score":0.09823174774646759,
-      "token":15610,
-      "token_str":"waiter"
    },
    {
-      "sequence":"[CLS] the man worked as a carpenter. [SEP]",
-      "score":0.08976428955793381,
-      "token":10533,
-      "token_str":"carpenter"
    },
    {
-      "sequence":"[CLS] the man worked as a mechanic. [SEP]",
-      "score":0.06550426036119461,
-      "token":15893,
       "token_str":"mechanic"
    },
    {
-      "sequence":"[CLS] the man worked as a butcher. [SEP]",
-      "score":0.04142395779490471,
-      "token":14998,
-      "token_str":"butcher"
    },
    {
-      "sequence":"[CLS] the man worked as a barber. [SEP]",
-      "score":0.03680137172341347,
-      "token":13362,
-      "token_str":"barber"
    }
 ]
 >>> unmasker("The woman worked as a [MASK].")
 [
    {
-      "sequence":"[CLS] the woman worked as a waitress. [SEP]",
-      "score":0.2669651508331299,
-      "token":13877,
-      "token_str":"waitress"
    },
    {
-      "sequence":"[CLS] the woman worked as a maid. [SEP]",
-      "score":0.13054853677749634,
-      "token":10850,
-      "token_str":"maid"
    },
    {
-      "sequence":"[CLS] the woman worked as a nurse. [SEP]",
-      "score":0.07987703382968903,
-      "token":6821,
       "token_str":"nurse"
    },
    {
-      "sequence":"[CLS] the woman worked as a prostitute. [SEP]",
-      "score":0.058545831590890884,
-      "token":19215,
-      "token_str":"prostitute"
    },
    {
-      "sequence":"[CLS] the woman worked as a cleaner. [SEP]",
-      "score":0.03834161534905434,
-      "token":20133,
-      "token_str":"cleaner"
    }
 ]
 ```
@@ -230,8 +234,7 @@ When fine-tuned on downstream tasks, this model achieves the following results:
 Model                                    | SQUAD 1.1 F1/EM | Multi NLI Accuracy
 ---------------------------------------- | :-------------: | :----------------:
-BERT-Large, Uncased (Whole Word Masking) | 92.8/86.7       | 87.07
 ### BibTeX entry and citation info

 Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
 [this paper](https://arxiv.org/abs/1810.04805) and first released in
+[this repository](https://github.com/google-research/bert). This model is cased: it makes a difference between english and English.
 Differently to other BERT models, this model was trained with a new technique: Whole Word Masking. In this case, all of the tokens corresponding to a word are masked at once. The overall masking rate remains the same.
 >>> unmasker = pipeline('fill-mask', model='bert-large-cased-whole-word-masking')
 >>> unmasker("Hello I'm a [MASK] model.")
 [
+   {
+      "sequence":"[CLS] Hello I'm a fashion model. [SEP]",
+      "score":0.1474294513463974,
+      "token":4633,
+      "token_str":"fashion"
+   },
+   {
+      "sequence":"[CLS] Hello I'm a magazine model. [SEP]",
+      "score":0.05430116504430771,
+      "token":2435,
+      "token_str":"magazine"
+   },
+   {
+      "sequence":"[CLS] Hello I'm a male model. [SEP]",
+      "score":0.039395421743392944,
+      "token":2581,
+      "token_str":"male"
+   },
+   {
+      "sequence":"[CLS] Hello I'm a former model. [SEP]",
+      "score":0.036936815828084946,
+      "token":1393,
+      "token_str":"former"
+   },
+   {
+      "sequence":"[CLS] Hello I'm a professional model. [SEP]",
+      "score":0.03663451969623566,
+      "token":1848,
+      "token_str":"professional"
+   }
 ]
 ```
 >>> unmasker("The man worked as a [MASK].")
 [
    {
+      "sequence":"[CLS] The man worked as a carpenter. [SEP]",
+      "score":0.09021259099245071,
+      "token":25169,
+      "token_str":"carpenter"
    },
    {
+      "sequence":"[CLS] The man worked as a cook. [SEP]",
+      "score":0.08125395327806473,
+      "token":9834,
+      "token_str":"cook"
    },
    {
+      "sequence":"[CLS] The man worked as a mechanic. [SEP]",
+      "score":0.07524766772985458,
+      "token":19459,
       "token_str":"mechanic"
    },
    {
+      "sequence":"[CLS] The man worked as a waiter. [SEP]",
+      "score":0.07397029548883438,
+      "token":17989,
+      "token_str":"waiter"
    },
    {
+      "sequence":"[CLS] The man worked as a guard. [SEP]",
+      "score":0.05848982185125351,
+      "token":3542,
+      "token_str":"guard"
    }
 ]
 >>> unmasker("The woman worked as a [MASK].")
 [
    {
+      "sequence":"[CLS] The woman worked as a maid. [SEP]",
+      "score":0.19436432421207428,
+      "token":13487,
+      "token_str":"maid"
    },
    {
+      "sequence":"[CLS] The woman worked as a waitress. [SEP]",
+      "score":0.16161060333251953,
+      "token":15098,
+      "token_str":"waitress"
    },
    {
+      "sequence":"[CLS] The woman worked as a nurse. [SEP]",
+      "score":0.14942803978919983,
+      "token":7439,
       "token_str":"nurse"
    },
    {
+      "sequence":"[CLS] The woman worked as a secretary. [SEP]",
+      "score":0.10373266786336899,
+      "token":4848,
+      "token_str":"secretary"
    },
    {
+      "sequence":"[CLS] The woman worked as a cook. [SEP]",
+      "score":0.06384387612342834,
+      "token":9834,
+      "token_str":"cook"
    }
 ]
 ```
 Model                                    | SQUAD 1.1 F1/EM | Multi NLI Accuracy
 ---------------------------------------- | :-------------: | :----------------:
+BERT-Large, Cased (Whole Word Masking)   | 92.9/86.7       | 86.46
 ### BibTeX entry and citation info