osanseviero commited on
Commit
7dfbadd
β€’
1 Parent(s): 1ce9484

Update spaCy pipeline

Browse files
.gitattributes CHANGED
@@ -14,3 +14,7 @@
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
17
+ *.whl filter=lfs diff=lfs merge=lfs -text
18
+ *.npz filter=lfs diff=lfs merge=lfs -text
19
+ *strings.json filter=lfs diff=lfs merge=lfs -text
20
+ vectors filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright 2021 ExplosionAI GmbH
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
4
+ this software and associated documentation files (the "Software"), to deal in
5
+ the Software without restriction, including without limitation the rights to
6
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7
+ of the Software, and to permit persons to whom the Software is furnished to do
8
+ so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19
+ SOFTWARE.
LICENSES_SOURCES ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OntoNotes 5
2
+
3
+ * Author: Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston
4
+ * URL: https://catalog.ldc.upenn.edu/LDC2013T19
5
+ * License: commercial (licensed by Explosion)
6
+
7
+ ```
8
+ ```
9
+
10
+
11
+
12
+
13
+ # CoreNLP Universal Dependencies Converter
14
+
15
+ * Author: Stanford NLP Group
16
+ * URL: https://nlp.stanford.edu/software/stanford-dependencies.html
17
+ * License: Citation provided for reference, no code packaged with model
18
+
19
+ ```
20
+ ```
21
+
22
+
23
+
24
+
25
+ # Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)
26
+
27
+ * Author: Explosion
28
+ * URL: https://spacy.io
29
+ * License: CC0
30
+
31
+ ```
32
+ The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work").
33
+
34
+ Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others.
35
+
36
+ For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights.
37
+
38
+ 1. Copyright and Related Rights. A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following:
39
+
40
+ the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work;
41
+ moral rights retained by the original author(s) and/or performer(s);
42
+ publicity and privacy rights pertaining to a person's image or likeness depicted in a Work;
43
+ rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below;
44
+ rights protecting the extraction, dissemination, use and reuse of data in a Work;
45
+ database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and
46
+ other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof.
47
+ 2. Waiver. To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose.
48
+
49
+ 3. Public License Fallback. Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose.
50
+
51
+ 4. Limitations and Disclaimers.
52
+
53
+ No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document.
54
+ Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law.
55
+ Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work.
56
+ Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work.```
57
+
58
+
59
+
60
+
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - spacy
4
+ - token-classification
5
+ language:
6
+ - zh
7
+ license: MIT
8
+ model-index:
9
+ - name: zh_core_web_lg
10
+ results:
11
+ - tasks:
12
+ name: NER
13
+ type: token-classification
14
+ metrics:
15
+ - name: Precision
16
+ type: precision
17
+ value: 0.7358998362
18
+ - name: Recall
19
+ type: recall
20
+ value: 0.6910989011
21
+ - name: F Score
22
+ type: f_score
23
+ value: 0.7127961011
24
+ - tasks:
25
+ name: POS
26
+ type: token-classification
27
+ metrics:
28
+ - name: Accuracy
29
+ type: accuracy
30
+ value: 0.9037457747
31
+ - tasks:
32
+ name: SENTER
33
+ type: token-classification
34
+ metrics:
35
+ - name: Precision
36
+ type: precision
37
+ value: 0.7896445968
38
+ - name: Recall
39
+ type: recall
40
+ value: 0.7286499084
41
+ - name: F Score
42
+ type: f_score
43
+ value: 0.7579220779
44
+ - tasks:
45
+ name: UNLABELED_DEPENDENCIES
46
+ type: token-classification
47
+ metrics:
48
+ - name: Accuracy
49
+ type: accuracy
50
+ value: 0.7069146954
51
+ - tasks:
52
+ name: LABELED_DEPENDENCIES
53
+ type: token-classification
54
+ metrics:
55
+ - name: Accuracy
56
+ type: accuracy
57
+ value: 0.7069146954
58
+ ---
59
+ ### Details: https://spacy.io/models/zh#zh_core_web_lg
60
+
61
+ Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.
62
+
63
+ | Feature | Description |
64
+ | --- | --- |
65
+ | **Name** | `zh_core_web_lg` |
66
+ | **Version** | `3.1.0` |
67
+ | **spaCy** | `>=3.1.0,<3.2.0` |
68
+ | **Default Pipeline** | `tok2vec`, `tagger`, `parser`, `attribute_ruler`, `ner` |
69
+ | **Components** | `tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `ner` |
70
+ | **Vectors** | 500000 keys, 500000 unique vectors (300 dimensions) |
71
+ | **Sources** | [OntoNotes 5](https://catalog.ldc.upenn.edu/LDC2013T19) (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston)<br />[CoreNLP Universal Dependencies Converter](https://nlp.stanford.edu/software/stanford-dependencies.html) (Stanford NLP Group)<br />[Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)](https://spacy.io) (Explosion) |
72
+ | **License** | `MIT` |
73
+ | **Author** | [Explosion](https://explosion.ai) |
74
+
75
+ ### Label Scheme
76
+
77
+ <details>
78
+
79
+ <summary>View label scheme (101 labels for 4 components)</summary>
80
+
81
+ | Component | Labels |
82
+ | --- | --- |
83
+ | **`tagger`** | `AD`, `AS`, `BA`, `CC`, `CD`, `CS`, `DEC`, `DEG`, `DER`, `DEV`, `DT`, `ETC`, `FW`, `IJ`, `INF`, `JJ`, `LB`, `LC`, `M`, `MSP`, `NN`, `NR`, `NT`, `OD`, `ON`, `P`, `PN`, `PU`, `SB`, `SP`, `URL`, `VA`, `VC`, `VE`, `VV`, `X` |
84
+ | **`parser`** | `ROOT`, `acl`, `advcl:loc`, `advmod`, `advmod:dvp`, `advmod:loc`, `advmod:rcomp`, `amod`, `amod:ordmod`, `appos`, `aux:asp`, `aux:ba`, `aux:modal`, `aux:prtmod`, `auxpass`, `case`, `cc`, `ccomp`, `compound:nn`, `compound:vc`, `conj`, `cop`, `dep`, `det`, `discourse`, `dobj`, `etc`, `mark`, `mark:clf`, `name`, `neg`, `nmod`, `nmod:assmod`, `nmod:poss`, `nmod:prep`, `nmod:range`, `nmod:tmod`, `nmod:topic`, `nsubj`, `nsubj:xsubj`, `nsubjpass`, `nummod`, `parataxis:prnmod`, `punct`, `xcomp` |
85
+ | **`senter`** | `I`, `S` |
86
+ | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART` |
87
+
88
+ </details>
89
+
90
+ ### Accuracy
91
+
92
+ | Type | Score |
93
+ | --- | --- |
94
+ | `TOKEN_ACC` | 97.88 |
95
+ | `TAG_ACC` | 90.37 |
96
+ | `DEP_UAS` | 70.69 |
97
+ | `DEP_LAS` | 65.55 |
98
+ | `ENTS_P` | 73.59 |
99
+ | `ENTS_R` | 69.11 |
100
+ | `ENTS_F` | 71.28 |
101
+ | `SENTS_P` | 78.96 |
102
+ | `SENTS_R` | 72.86 |
103
+ | `SENTS_F` | 75.79 |
accuracy.json ADDED
@@ -0,0 +1,332 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "token_acc": 0.9788303388,
3
+ "tag_acc": 0.9037457747,
4
+ "dep_uas": 0.7069146954,
5
+ "dep_las": 0.6555390607,
6
+ "ents_p": 0.7358998362,
7
+ "ents_r": 0.6910989011,
8
+ "ents_f": 0.7127961011,
9
+ "sents_p": 0.7896445968,
10
+ "sents_r": 0.7286499084,
11
+ "sents_f": 0.7579220779,
12
+ "speed": 9733.8076235494,
13
+ "dep_las_per_type": {
14
+ "dep": {
15
+ "p": 0.4876810512,
16
+ "r": 0.3299989896,
17
+ "f": 0.3936362541
18
+ },
19
+ "case": {
20
+ "p": 0.8168795974,
21
+ "r": 0.7674587779,
22
+ "f": 0.7913983872
23
+ },
24
+ "nmod:tmod": {
25
+ "p": 0.7313237221,
26
+ "r": 0.7591836735,
27
+ "f": 0.7449933244
28
+ },
29
+ "nummod": {
30
+ "p": 0.8191268191,
31
+ "r": 0.5249833444,
32
+ "f": 0.6398700771
33
+ },
34
+ "mark:clf": {
35
+ "p": 0.9383017715,
36
+ "r": 0.572920552,
37
+ "f": 0.7114404817
38
+ },
39
+ "auxpass": {
40
+ "p": 0.8817204301,
41
+ "r": 0.8864864865,
42
+ "f": 0.884097035
43
+ },
44
+ "nsubj": {
45
+ "p": 0.7777050039,
46
+ "r": 0.7292715883,
47
+ "f": 0.7527099842
48
+ },
49
+ "acl": {
50
+ "p": 0.7153127247,
51
+ "r": 0.5518580144,
52
+ "f": 0.623043206
53
+ },
54
+ "advmod": {
55
+ "p": 0.8195641156,
56
+ "r": 0.7331670823,
57
+ "f": 0.7739619481
58
+ },
59
+ "mark": {
60
+ "p": 0.7456996746,
61
+ "r": 0.7028921998,
62
+ "f": 0.7236634333
63
+ },
64
+ "xcomp": {
65
+ "p": 0.7944444444,
66
+ "r": 0.6986970684,
67
+ "f": 0.7435008666
68
+ },
69
+ "nmod:assmod": {
70
+ "p": 0.7745130406,
71
+ "r": 0.7301587302,
72
+ "f": 0.7516821532
73
+ },
74
+ "det": {
75
+ "p": 0.8369132856,
76
+ "r": 0.6162858817,
77
+ "f": 0.709851552
78
+ },
79
+ "amod": {
80
+ "p": 0.7794589638,
81
+ "r": 0.6677140613,
82
+ "f": 0.7192722657
83
+ },
84
+ "nmod:prep": {
85
+ "p": 0.7016613644,
86
+ "r": 0.6004234725,
87
+ "f": 0.6471067645
88
+ },
89
+ "root": {
90
+ "p": 0.7394862036,
91
+ "r": 0.6469119361,
92
+ "f": 0.6901083289
93
+ },
94
+ "aux:prtmod": {
95
+ "p": 0.9246031746,
96
+ "r": 0.8321428571,
97
+ "f": 0.8759398496
98
+ },
99
+ "compound:nn": {
100
+ "p": 0.7463895738,
101
+ "r": 0.7170896785,
102
+ "f": 0.7314463238
103
+ },
104
+ "dobj": {
105
+ "p": 0.7939269334,
106
+ "r": 0.7435935417,
107
+ "f": 0.7679363622
108
+ },
109
+ "ccomp": {
110
+ "p": 0.6330907698,
111
+ "r": 0.6426905132,
112
+ "f": 0.6378545244
113
+ },
114
+ "advmod:rcomp": {
115
+ "p": 0.8229813665,
116
+ "r": 0.7340720222,
117
+ "f": 0.775988287
118
+ },
119
+ "nmod:topic": {
120
+ "p": 0.3762886598,
121
+ "r": 0.237012987,
122
+ "f": 0.2908366534
123
+ },
124
+ "cop": {
125
+ "p": 0.7518367347,
126
+ "r": 0.5926640927,
127
+ "f": 0.6628283555
128
+ },
129
+ "discourse": {
130
+ "p": 0.5575139147,
131
+ "r": 0.4958745875,
132
+ "f": 0.5248908297
133
+ },
134
+ "neg": {
135
+ "p": 0.8395802099,
136
+ "r": 0.6658739596,
137
+ "f": 0.7427055703
138
+ },
139
+ "aux:modal": {
140
+ "p": 0.8475289169,
141
+ "r": 0.8335056877,
142
+ "f": 0.8404588113
143
+ },
144
+ "nmod": {
145
+ "p": 0.7278688525,
146
+ "r": 0.6024423338,
147
+ "f": 0.6592427617
148
+ },
149
+ "aux:ba": {
150
+ "p": 0.807486631,
151
+ "r": 0.8031914894,
152
+ "f": 0.8053333333
153
+ },
154
+ "advmod:loc": {
155
+ "p": 0.6349206349,
156
+ "r": 0.4747774481,
157
+ "f": 0.5432937182
158
+ },
159
+ "aux:asp": {
160
+ "p": 0.9013854931,
161
+ "r": 0.8819776715,
162
+ "f": 0.8915759774
163
+ },
164
+ "conj": {
165
+ "p": 0.4869204402,
166
+ "r": 0.5102079395,
167
+ "f": 0.4982922551
168
+ },
169
+ "nsubjpass": {
170
+ "p": 0.8048780488,
171
+ "r": 0.66,
172
+ "f": 0.7252747253
173
+ },
174
+ "compound:vc": {
175
+ "p": 0.4647058824,
176
+ "r": 0.4093264249,
177
+ "f": 0.435261708
178
+ },
179
+ "advcl:loc": {
180
+ "p": 0.5573770492,
181
+ "r": 0.4857142857,
182
+ "f": 0.5190839695
183
+ },
184
+ "cc": {
185
+ "p": 0.7340425532,
186
+ "r": 0.6734693878,
187
+ "f": 0.7024525683
188
+ },
189
+ "advmod:dvp": {
190
+ "p": 0.8320610687,
191
+ "r": 0.6770186335,
192
+ "f": 0.7465753425
193
+ },
194
+ "appos": {
195
+ "p": 0.8740920097,
196
+ "r": 0.8298850575,
197
+ "f": 0.8514150943
198
+ },
199
+ "nmod:poss": {
200
+ "p": 0.7341772152,
201
+ "r": 0.4296296296,
202
+ "f": 0.5420560748
203
+ },
204
+ "name": {
205
+ "p": 0.6018518519,
206
+ "r": 0.4814814815,
207
+ "f": 0.5349794239
208
+ },
209
+ "nsubj:xsubj": {
210
+ "p": 0.0,
211
+ "r": 0.0,
212
+ "f": 0.0
213
+ },
214
+ "nmod:range": {
215
+ "p": 0.7035714286,
216
+ "r": 0.6610738255,
217
+ "f": 0.6816608997
218
+ },
219
+ "parataxis:prnmod": {
220
+ "p": 0.5454545455,
221
+ "r": 0.1353383459,
222
+ "f": 0.2168674699
223
+ },
224
+ "amod:ordmod": {
225
+ "p": 0.564516129,
226
+ "r": 0.546875,
227
+ "f": 0.5555555556
228
+ },
229
+ "erased": {
230
+ "p": 0.0,
231
+ "r": 0.0,
232
+ "f": 0.0
233
+ },
234
+ "etc": {
235
+ "p": 0.9069767442,
236
+ "r": 0.9285714286,
237
+ "f": 0.9176470588
238
+ }
239
+ },
240
+ "ents_per_type": {
241
+ "DATE": {
242
+ "p": 0.7675925926,
243
+ "r": 0.82160555,
244
+ "f": 0.7936811872
245
+ },
246
+ "GPE": {
247
+ "p": 0.7719060524,
248
+ "r": 0.8352883675,
249
+ "f": 0.8023474178
250
+ },
251
+ "ORDINAL": {
252
+ "p": 0.8388888889,
253
+ "r": 0.7947368421,
254
+ "f": 0.8162162162
255
+ },
256
+ "FAC": {
257
+ "p": 0.5581395349,
258
+ "r": 0.3870967742,
259
+ "f": 0.4571428571
260
+ },
261
+ "ORG": {
262
+ "p": 0.7028571429,
263
+ "r": 0.6552511416,
264
+ "f": 0.6782197716
265
+ },
266
+ "LOC": {
267
+ "p": 0.5894039735,
268
+ "r": 0.4784946237,
269
+ "f": 0.528189911
270
+ },
271
+ "QUANTITY": {
272
+ "p": 0.7889908257,
273
+ "r": 0.637037037,
274
+ "f": 0.7049180328
275
+ },
276
+ "WORK_OF_ART": {
277
+ "p": 0.5,
278
+ "r": 0.2866666667,
279
+ "f": 0.3644067797
280
+ },
281
+ "CARDINAL": {
282
+ "p": 0.614744352,
283
+ "r": 0.5211693548,
284
+ "f": 0.5641025641
285
+ },
286
+ "NORP": {
287
+ "p": 0.6755952381,
288
+ "r": 0.4768907563,
289
+ "f": 0.5591133005
290
+ },
291
+ "TIME": {
292
+ "p": 0.7365853659,
293
+ "r": 0.7330097087,
294
+ "f": 0.7347931873
295
+ },
296
+ "MONEY": {
297
+ "p": 0.9322033898,
298
+ "r": 0.8148148148,
299
+ "f": 0.8695652174
300
+ },
301
+ "EVENT": {
302
+ "p": 0.5681818182,
303
+ "r": 0.3676470588,
304
+ "f": 0.4464285714
305
+ },
306
+ "PERSON": {
307
+ "p": 0.8077682686,
308
+ "r": 0.7905927835,
309
+ "f": 0.7990882449
310
+ },
311
+ "PERCENT": {
312
+ "p": 0.7882352941,
313
+ "r": 0.8072289157,
314
+ "f": 0.7976190476
315
+ },
316
+ "PRODUCT": {
317
+ "p": 0.0,
318
+ "r": 0.0,
319
+ "f": 0.0
320
+ },
321
+ "LAW": {
322
+ "p": 0.3333333333,
323
+ "r": 0.1,
324
+ "f": 0.1538461538
325
+ },
326
+ "LANGUAGE": {
327
+ "p": 0.5555555556,
328
+ "r": 0.5555555556,
329
+ "f": 0.5555555556
330
+ }
331
+ }
332
+ }
attribute_ruler/patterns ADDED
Binary file (1.93 kB). View file
 
config.cfg ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [paths]
2
+ train = "corpus/zh-core-news/train.spacy"
3
+ dev = "corpus/zh-core-news/dev.spacy"
4
+ vectors = "corpus/zh_vectors"
5
+ raw = null
6
+ init_tok2vec = null
7
+ vocab_data = null
8
+
9
+ [system]
10
+ gpu_allocator = null
11
+ seed = 0
12
+
13
+ [nlp]
14
+ lang = "zh"
15
+ pipeline = ["tok2vec","tagger","parser","senter","attribute_ruler","ner"]
16
+ disabled = ["senter"]
17
+ before_creation = null
18
+ after_creation = null
19
+ after_pipeline_creation = null
20
+ batch_size = 256
21
+
22
+ [nlp.tokenizer]
23
+ @tokenizers = "spacy.zh.ChineseTokenizer"
24
+ segmenter = "pkuseg"
25
+
26
+ [components]
27
+
28
+ [components.attribute_ruler]
29
+ factory = "attribute_ruler"
30
+ validate = false
31
+
32
+ [components.ner]
33
+ factory = "ner"
34
+ incorrect_spans_key = null
35
+ moves = null
36
+ update_with_oracle_cut_size = 100
37
+
38
+ [components.ner.model]
39
+ @architectures = "spacy.TransitionBasedParser.v2"
40
+ state_type = "ner"
41
+ extra_state_tokens = false
42
+ hidden_width = 64
43
+ maxout_pieces = 2
44
+ use_upper = true
45
+ nO = null
46
+
47
+ [components.ner.model.tok2vec]
48
+ @architectures = "spacy.Tok2Vec.v2"
49
+
50
+ [components.ner.model.tok2vec.embed]
51
+ @architectures = "spacy.MultiHashEmbed.v2"
52
+ width = 96
53
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
54
+ rows = [5000,2500,2500,2500]
55
+ include_static_vectors = true
56
+
57
+ [components.ner.model.tok2vec.encode]
58
+ @architectures = "spacy.MaxoutWindowEncoder.v2"
59
+ width = 96
60
+ depth = 4
61
+ window_size = 1
62
+ maxout_pieces = 3
63
+
64
+ [components.parser]
65
+ factory = "parser"
66
+ learn_tokens = false
67
+ min_action_freq = 30
68
+ moves = null
69
+ update_with_oracle_cut_size = 100
70
+
71
+ [components.parser.model]
72
+ @architectures = "spacy.TransitionBasedParser.v2"
73
+ state_type = "parser"
74
+ extra_state_tokens = false
75
+ hidden_width = 64
76
+ maxout_pieces = 2
77
+ use_upper = true
78
+ nO = null
79
+
80
+ [components.parser.model.tok2vec]
81
+ @architectures = "spacy.Tok2VecListener.v1"
82
+ width = ${components.tok2vec.model.encode:width}
83
+ upstream = "tok2vec"
84
+
85
+ [components.senter]
86
+ factory = "senter"
87
+
88
+ [components.senter.model]
89
+ @architectures = "spacy.Tagger.v1"
90
+ nO = null
91
+
92
+ [components.senter.model.tok2vec]
93
+ @architectures = "spacy.Tok2Vec.v2"
94
+
95
+ [components.senter.model.tok2vec.embed]
96
+ @architectures = "spacy.MultiHashEmbed.v2"
97
+ width = 16
98
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
99
+ rows = [1000,500,500,500]
100
+ include_static_vectors = true
101
+
102
+ [components.senter.model.tok2vec.encode]
103
+ @architectures = "spacy.MaxoutWindowEncoder.v2"
104
+ width = 16
105
+ depth = 2
106
+ window_size = 1
107
+ maxout_pieces = 2
108
+
109
+ [components.tagger]
110
+ factory = "tagger"
111
+
112
+ [components.tagger.model]
113
+ @architectures = "spacy.Tagger.v1"
114
+ nO = null
115
+
116
+ [components.tagger.model.tok2vec]
117
+ @architectures = "spacy.Tok2VecListener.v1"
118
+ width = ${components.tok2vec.model.encode:width}
119
+ upstream = "tok2vec"
120
+
121
+ [components.tok2vec]
122
+ factory = "tok2vec"
123
+
124
+ [components.tok2vec.model]
125
+ @architectures = "spacy.Tok2Vec.v2"
126
+
127
+ [components.tok2vec.model.embed]
128
+ @architectures = "spacy.MultiHashEmbed.v2"
129
+ width = ${components.tok2vec.model.encode:width}
130
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
131
+ rows = [5000,2500,2500,2500]
132
+ include_static_vectors = true
133
+
134
+ [components.tok2vec.model.encode]
135
+ @architectures = "spacy.MaxoutWindowEncoder.v2"
136
+ width = 96
137
+ depth = 4
138
+ window_size = 1
139
+ maxout_pieces = 3
140
+
141
+ [corpora]
142
+
143
+ [corpora.dev]
144
+ @readers = "spacy.Corpus.v1"
145
+ limit = 0
146
+ max_length = 0
147
+ path = ${paths:dev}
148
+ gold_preproc = false
149
+ augmenter = null
150
+
151
+ [corpora.train]
152
+ @readers = "spacy.Corpus.v1"
153
+ path = ${paths:train}
154
+ max_length = 5000
155
+ gold_preproc = false
156
+ limit = 0
157
+ augmenter = null
158
+
159
+ [training]
160
+ train_corpus = "corpora.train"
161
+ dev_corpus = "corpora.dev"
162
+ seed = ${system:seed}
163
+ gpu_allocator = ${system:gpu_allocator}
164
+ dropout = 0.1
165
+ accumulate_gradient = 1
166
+ patience = 5000
167
+ max_epochs = 0
168
+ max_steps = 0
169
+ eval_frequency = 1000
170
+ frozen_components = []
171
+ before_to_disk = null
172
+ annotating_components = []
173
+
174
+ [training.batcher]
175
+ @batchers = "spacy.batch_by_words.v1"
176
+ discard_oversize = false
177
+ tolerance = 0.2
178
+ get_length = null
179
+
180
+ [training.batcher.size]
181
+ @schedules = "compounding.v1"
182
+ start = 100
183
+ stop = 1000
184
+ compound = 1.001
185
+ t = 0.0
186
+
187
+ [training.logger]
188
+ @loggers = "spacy.WandbLogger.v1"
189
+ project_name = "spacy-v3.0.0a2"
190
+ remove_config_values = []
191
+
192
+ [training.optimizer]
193
+ @optimizers = "Adam.v1"
194
+ beta1 = 0.9
195
+ beta2 = 0.999
196
+ L2_is_weight_decay = true
197
+ L2 = 0.01
198
+ grad_clip = 1.0
199
+ use_averages = true
200
+ eps = 0.00000001
201
+ learn_rate = 0.001
202
+
203
+ [training.score_weights]
204
+ tag_acc = 0.24
205
+ dep_uas = 0.0
206
+ dep_las = 0.24
207
+ dep_las_per_type = null
208
+ sents_p = null
209
+ sents_r = null
210
+ sents_f = 0.03
211
+ ents_f = 0.5
212
+ ents_p = 0.0
213
+ ents_r = 0.0
214
+ ents_per_type = null
215
+
216
+ [pretraining]
217
+
218
+ [initialize]
219
+ vocab_data = ${paths.vocab_data}
220
+ vectors = ${paths.vectors}
221
+ init_tok2vec = ${paths.init_tok2vec}
222
+ before_init = null
223
+ after_init = null
224
+
225
+ [initialize.components]
226
+
227
+ [initialize.components.ner]
228
+
229
+ [initialize.components.ner.labels]
230
+ @readers = "spacy.read_labels.v1"
231
+ path = "corpus/labels/ner.json"
232
+ require = false
233
+
234
+ [initialize.components.parser]
235
+
236
+ [initialize.components.parser.labels]
237
+ @readers = "spacy.read_labels.v1"
238
+ path = "corpus/labels/parser.json"
239
+ require = false
240
+
241
+ [initialize.components.tagger]
242
+
243
+ [initialize.components.tagger.labels]
244
+ @readers = "spacy.read_labels.v1"
245
+ path = "corpus/labels/tagger.json"
246
+ require = false
247
+
248
+ [initialize.lookups]
249
+ @misc = "spacy.LookupsDataLoader.v1"
250
+ lang = ${nlp.lang}
251
+ tables = []
252
+
253
+ [initialize.tokenizer]
254
+ pkuseg_model = "assets/pkuseg_model"
255
+ pkuseg_user_dict = "default"
meta.json ADDED
@@ -0,0 +1,508 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "lang":"zh",
3
+ "name":"core_web_lg",
4
+ "version":"3.1.0",
5
+ "description":"Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.",
6
+ "author":"Explosion",
7
+ "email":"[email protected]",
8
+ "url":"https://explosion.ai",
9
+ "license":"MIT",
10
+ "spacy_version":">=3.1.0,<3.2.0",
11
+ "spacy_git_version":"caba63b74",
12
+ "vectors":{
13
+ "width":300,
14
+ "vectors":500000,
15
+ "keys":500000,
16
+ "name":"zh_vectors"
17
+ },
18
+ "labels":{
19
+ "tok2vec":[
20
+
21
+ ],
22
+ "tagger":[
23
+ "AD",
24
+ "AS",
25
+ "BA",
26
+ "CC",
27
+ "CD",
28
+ "CS",
29
+ "DEC",
30
+ "DEG",
31
+ "DER",
32
+ "DEV",
33
+ "DT",
34
+ "ETC",
35
+ "FW",
36
+ "IJ",
37
+ "INF",
38
+ "JJ",
39
+ "LB",
40
+ "LC",
41
+ "M",
42
+ "MSP",
43
+ "NN",
44
+ "NR",
45
+ "NT",
46
+ "OD",
47
+ "ON",
48
+ "P",
49
+ "PN",
50
+ "PU",
51
+ "SB",
52
+ "SP",
53
+ "URL",
54
+ "VA",
55
+ "VC",
56
+ "VE",
57
+ "VV",
58
+ "X"
59
+ ],
60
+ "parser":[
61
+ "ROOT",
62
+ "acl",
63
+ "advcl:loc",
64
+ "advmod",
65
+ "advmod:dvp",
66
+ "advmod:loc",
67
+ "advmod:rcomp",
68
+ "amod",
69
+ "amod:ordmod",
70
+ "appos",
71
+ "aux:asp",
72
+ "aux:ba",
73
+ "aux:modal",
74
+ "aux:prtmod",
75
+ "auxpass",
76
+ "case",
77
+ "cc",
78
+ "ccomp",
79
+ "compound:nn",
80
+ "compound:vc",
81
+ "conj",
82
+ "cop",
83
+ "dep",
84
+ "det",
85
+ "discourse",
86
+ "dobj",
87
+ "etc",
88
+ "mark",
89
+ "mark:clf",
90
+ "name",
91
+ "neg",
92
+ "nmod",
93
+ "nmod:assmod",
94
+ "nmod:poss",
95
+ "nmod:prep",
96
+ "nmod:range",
97
+ "nmod:tmod",
98
+ "nmod:topic",
99
+ "nsubj",
100
+ "nsubj:xsubj",
101
+ "nsubjpass",
102
+ "nummod",
103
+ "parataxis:prnmod",
104
+ "punct",
105
+ "xcomp"
106
+ ],
107
+ "senter":[
108
+ "I",
109
+ "S"
110
+ ],
111
+ "attribute_ruler":[
112
+
113
+ ],
114
+ "ner":[
115
+ "CARDINAL",
116
+ "DATE",
117
+ "EVENT",
118
+ "FAC",
119
+ "GPE",
120
+ "LANGUAGE",
121
+ "LAW",
122
+ "LOC",
123
+ "MONEY",
124
+ "NORP",
125
+ "ORDINAL",
126
+ "ORG",
127
+ "PERCENT",
128
+ "PERSON",
129
+ "PRODUCT",
130
+ "QUANTITY",
131
+ "TIME",
132
+ "WORK_OF_ART"
133
+ ]
134
+ },
135
+ "pipeline":[
136
+ "tok2vec",
137
+ "tagger",
138
+ "parser",
139
+ "attribute_ruler",
140
+ "ner"
141
+ ],
142
+ "components":[
143
+ "tok2vec",
144
+ "tagger",
145
+ "parser",
146
+ "senter",
147
+ "attribute_ruler",
148
+ "ner"
149
+ ],
150
+ "disabled":[
151
+ "senter"
152
+ ],
153
+ "performance":{
154
+ "token_acc":0.9788303388,
155
+ "tag_acc":0.9037457747,
156
+ "dep_uas":0.7069146954,
157
+ "dep_las":0.6555390607,
158
+ "ents_p":0.7358998362,
159
+ "ents_r":0.6910989011,
160
+ "ents_f":0.7127961011,
161
+ "sents_p":0.7896445968,
162
+ "sents_r":0.7286499084,
163
+ "sents_f":0.7579220779,
164
+ "speed":9733.8076235494,
165
+ "dep_las_per_type":{
166
+ "dep":{
167
+ "p":0.4876810512,
168
+ "r":0.3299989896,
169
+ "f":0.3936362541
170
+ },
171
+ "case":{
172
+ "p":0.8168795974,
173
+ "r":0.7674587779,
174
+ "f":0.7913983872
175
+ },
176
+ "nmod:tmod":{
177
+ "p":0.7313237221,
178
+ "r":0.7591836735,
179
+ "f":0.7449933244
180
+ },
181
+ "nummod":{
182
+ "p":0.8191268191,
183
+ "r":0.5249833444,
184
+ "f":0.6398700771
185
+ },
186
+ "mark:clf":{
187
+ "p":0.9383017715,
188
+ "r":0.572920552,
189
+ "f":0.7114404817
190
+ },
191
+ "auxpass":{
192
+ "p":0.8817204301,
193
+ "r":0.8864864865,
194
+ "f":0.884097035
195
+ },
196
+ "nsubj":{
197
+ "p":0.7777050039,
198
+ "r":0.7292715883,
199
+ "f":0.7527099842
200
+ },
201
+ "acl":{
202
+ "p":0.7153127247,
203
+ "r":0.5518580144,
204
+ "f":0.623043206
205
+ },
206
+ "advmod":{
207
+ "p":0.8195641156,
208
+ "r":0.7331670823,
209
+ "f":0.7739619481
210
+ },
211
+ "mark":{
212
+ "p":0.7456996746,
213
+ "r":0.7028921998,
214
+ "f":0.7236634333
215
+ },
216
+ "xcomp":{
217
+ "p":0.7944444444,
218
+ "r":0.6986970684,
219
+ "f":0.7435008666
220
+ },
221
+ "nmod:assmod":{
222
+ "p":0.7745130406,
223
+ "r":0.7301587302,
224
+ "f":0.7516821532
225
+ },
226
+ "det":{
227
+ "p":0.8369132856,
228
+ "r":0.6162858817,
229
+ "f":0.709851552
230
+ },
231
+ "amod":{
232
+ "p":0.7794589638,
233
+ "r":0.6677140613,
234
+ "f":0.7192722657
235
+ },
236
+ "nmod:prep":{
237
+ "p":0.7016613644,
238
+ "r":0.6004234725,
239
+ "f":0.6471067645
240
+ },
241
+ "root":{
242
+ "p":0.7394862036,
243
+ "r":0.6469119361,
244
+ "f":0.6901083289
245
+ },
246
+ "aux:prtmod":{
247
+ "p":0.9246031746,
248
+ "r":0.8321428571,
249
+ "f":0.8759398496
250
+ },
251
+ "compound:nn":{
252
+ "p":0.7463895738,
253
+ "r":0.7170896785,
254
+ "f":0.7314463238
255
+ },
256
+ "dobj":{
257
+ "p":0.7939269334,
258
+ "r":0.7435935417,
259
+ "f":0.7679363622
260
+ },
261
+ "ccomp":{
262
+ "p":0.6330907698,
263
+ "r":0.6426905132,
264
+ "f":0.6378545244
265
+ },
266
+ "advmod:rcomp":{
267
+ "p":0.8229813665,
268
+ "r":0.7340720222,
269
+ "f":0.775988287
270
+ },
271
+ "nmod:topic":{
272
+ "p":0.3762886598,
273
+ "r":0.237012987,
274
+ "f":0.2908366534
275
+ },
276
+ "cop":{
277
+ "p":0.7518367347,
278
+ "r":0.5926640927,
279
+ "f":0.6628283555
280
+ },
281
+ "discourse":{
282
+ "p":0.5575139147,
283
+ "r":0.4958745875,
284
+ "f":0.5248908297
285
+ },
286
+ "neg":{
287
+ "p":0.8395802099,
288
+ "r":0.6658739596,
289
+ "f":0.7427055703
290
+ },
291
+ "aux:modal":{
292
+ "p":0.8475289169,
293
+ "r":0.8335056877,
294
+ "f":0.8404588113
295
+ },
296
+ "nmod":{
297
+ "p":0.7278688525,
298
+ "r":0.6024423338,
299
+ "f":0.6592427617
300
+ },
301
+ "aux:ba":{
302
+ "p":0.807486631,
303
+ "r":0.8031914894,
304
+ "f":0.8053333333
305
+ },
306
+ "advmod:loc":{
307
+ "p":0.6349206349,
308
+ "r":0.4747774481,
309
+ "f":0.5432937182
310
+ },
311
+ "aux:asp":{
312
+ "p":0.9013854931,
313
+ "r":0.8819776715,
314
+ "f":0.8915759774
315
+ },
316
+ "conj":{
317
+ "p":0.4869204402,
318
+ "r":0.5102079395,
319
+ "f":0.4982922551
320
+ },
321
+ "nsubjpass":{
322
+ "p":0.8048780488,
323
+ "r":0.66,
324
+ "f":0.7252747253
325
+ },
326
+ "compound:vc":{
327
+ "p":0.4647058824,
328
+ "r":0.4093264249,
329
+ "f":0.435261708
330
+ },
331
+ "advcl:loc":{
332
+ "p":0.5573770492,
333
+ "r":0.4857142857,
334
+ "f":0.5190839695
335
+ },
336
+ "cc":{
337
+ "p":0.7340425532,
338
+ "r":0.6734693878,
339
+ "f":0.7024525683
340
+ },
341
+ "advmod:dvp":{
342
+ "p":0.8320610687,
343
+ "r":0.6770186335,
344
+ "f":0.7465753425
345
+ },
346
+ "appos":{
347
+ "p":0.8740920097,
348
+ "r":0.8298850575,
349
+ "f":0.8514150943
350
+ },
351
+ "nmod:poss":{
352
+ "p":0.7341772152,
353
+ "r":0.4296296296,
354
+ "f":0.5420560748
355
+ },
356
+ "name":{
357
+ "p":0.6018518519,
358
+ "r":0.4814814815,
359
+ "f":0.5349794239
360
+ },
361
+ "nsubj:xsubj":{
362
+ "p":0.0,
363
+ "r":0.0,
364
+ "f":0.0
365
+ },
366
+ "nmod:range":{
367
+ "p":0.7035714286,
368
+ "r":0.6610738255,
369
+ "f":0.6816608997
370
+ },
371
+ "parataxis:prnmod":{
372
+ "p":0.5454545455,
373
+ "r":0.1353383459,
374
+ "f":0.2168674699
375
+ },
376
+ "amod:ordmod":{
377
+ "p":0.564516129,
378
+ "r":0.546875,
379
+ "f":0.5555555556
380
+ },
381
+ "erased":{
382
+ "p":0.0,
383
+ "r":0.0,
384
+ "f":0.0
385
+ },
386
+ "etc":{
387
+ "p":0.9069767442,
388
+ "r":0.9285714286,
389
+ "f":0.9176470588
390
+ }
391
+ },
392
+ "ents_per_type":{
393
+ "DATE":{
394
+ "p":0.7675925926,
395
+ "r":0.82160555,
396
+ "f":0.7936811872
397
+ },
398
+ "GPE":{
399
+ "p":0.7719060524,
400
+ "r":0.8352883675,
401
+ "f":0.8023474178
402
+ },
403
+ "ORDINAL":{
404
+ "p":0.8388888889,
405
+ "r":0.7947368421,
406
+ "f":0.8162162162
407
+ },
408
+ "FAC":{
409
+ "p":0.5581395349,
410
+ "r":0.3870967742,
411
+ "f":0.4571428571
412
+ },
413
+ "ORG":{
414
+ "p":0.7028571429,
415
+ "r":0.6552511416,
416
+ "f":0.6782197716
417
+ },
418
+ "LOC":{
419
+ "p":0.5894039735,
420
+ "r":0.4784946237,
421
+ "f":0.528189911
422
+ },
423
+ "QUANTITY":{
424
+ "p":0.7889908257,
425
+ "r":0.637037037,
426
+ "f":0.7049180328
427
+ },
428
+ "WORK_OF_ART":{
429
+ "p":0.5,
430
+ "r":0.2866666667,
431
+ "f":0.3644067797
432
+ },
433
+ "CARDINAL":{
434
+ "p":0.614744352,
435
+ "r":0.5211693548,
436
+ "f":0.5641025641
437
+ },
438
+ "NORP":{
439
+ "p":0.6755952381,
440
+ "r":0.4768907563,
441
+ "f":0.5591133005
442
+ },
443
+ "TIME":{
444
+ "p":0.7365853659,
445
+ "r":0.7330097087,
446
+ "f":0.7347931873
447
+ },
448
+ "MONEY":{
449
+ "p":0.9322033898,
450
+ "r":0.8148148148,
451
+ "f":0.8695652174
452
+ },
453
+ "EVENT":{
454
+ "p":0.5681818182,
455
+ "r":0.3676470588,
456
+ "f":0.4464285714
457
+ },
458
+ "PERSON":{
459
+ "p":0.8077682686,
460
+ "r":0.7905927835,
461
+ "f":0.7990882449
462
+ },
463
+ "PERCENT":{
464
+ "p":0.7882352941,
465
+ "r":0.8072289157,
466
+ "f":0.7976190476
467
+ },
468
+ "PRODUCT":{
469
+ "p":0.0,
470
+ "r":0.0,
471
+ "f":0.0
472
+ },
473
+ "LAW":{
474
+ "p":0.3333333333,
475
+ "r":0.1,
476
+ "f":0.1538461538
477
+ },
478
+ "LANGUAGE":{
479
+ "p":0.5555555556,
480
+ "r":0.5555555556,
481
+ "f":0.5555555556
482
+ }
483
+ }
484
+ },
485
+ "sources":[
486
+ {
487
+ "name":"OntoNotes 5",
488
+ "url":"https://catalog.ldc.upenn.edu/LDC2013T19",
489
+ "license":"commercial (licensed by Explosion)",
490
+ "author":"Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston"
491
+ },
492
+ {
493
+ "name":"CoreNLP Universal Dependencies Converter",
494
+ "url":"https://nlp.stanford.edu/software/stanford-dependencies.html",
495
+ "author":"Stanford NLP Group",
496
+ "license":"Citation provided for reference, no code packaged with model"
497
+ },
498
+ {
499
+ "name":"Explosion fastText Vectors (cbow, OSCAR Common Crawl + Wikipedia)",
500
+ "url":"https://spacy.io",
501
+ "license":"CC0",
502
+ "author":"Explosion"
503
+ }
504
+ ],
505
+ "requirements":[
506
+ "spacy-pkuseg>=0.0.27,<0.1.0"
507
+ ]
508
+ }
ner/cfg ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "moves":null,
3
+ "update_with_oracle_cut_size":100,
4
+ "multitasks":[
5
+
6
+ ],
7
+ "min_action_freq":1,
8
+ "learn_tokens":false,
9
+ "beam_width":1,
10
+ "beam_density":0.0,
11
+ "beam_update_prob":0.0,
12
+ "incorrect_spans_key":null
13
+ }
ner/model ADDED
Binary file (6.96 MB). View file
 
ner/moves ADDED
@@ -0,0 +1 @@
 
 
1
+ οΏ½οΏ½movesοΏ½οΏ½{"0":{},"1":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"2":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"3":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"4":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336,"":1},"5":{"":1}}οΏ½cfgοΏ½οΏ½neg_keyοΏ½
parser/cfg ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "moves":null,
3
+ "update_with_oracle_cut_size":100,
4
+ "multitasks":[
5
+
6
+ ],
7
+ "min_action_freq":30,
8
+ "learn_tokens":false,
9
+ "beam_width":1,
10
+ "beam_density":0.0,
11
+ "beam_update_prob":0.0,
12
+ "incorrect_spans_key":null
13
+ }
parser/model ADDED
Binary file (309 kB). View file
 
parser/moves ADDED
@@ -0,0 +1 @@
 
 
1
+ οΏ½οΏ½movesοΏ½οΏ½{"0":{"":406716},"1":{"":267231},"2":{"advmod":56960,"nsubj":53520,"compound:nn":43919,"dep":40111,"punct":36035,"case":23986,"nmod:assmod":21599,"nmod:prep":20098,"amod":16922,"acl":11979,"conj":10687,"cop":7238,"det":7210,"nummod":6994,"cc":6235,"aux:modal":5566,"nmod:tmod":5335,"nmod":4915,"neg":4363,"xcomp":3881,"appos":2955,"nmod:topic":2410,"discourse":2163,"advmod:loc":1591,"aux:prtmod":1539,"aux:ba":1311,"auxpass":1220,"advmod:dvp":1142,"advcl:loc":1046,"name":1032,"compound:vc":830,"nmod:poss":560,"amod:ordmod":511,"dobj":406,"nsubjpass":263,"nsubj:xsubj||ccomp":62,"parataxis:prnmod":34,"nsubj:xsubj":32},"3":{"punct":74006,"dobj":45383,"conj":30040,"case":30024,"dep":18660,"ccomp":17216,"mark":16600,"mark:clf":11551,"aux:asp":7896,"discourse":3998,"advmod:rcomp":2387,"nmod:range":1885,"cc":1675,"nmod:prep":1595,"advmod":1116,"etc":941,"compound:vc":790,"parataxis:prnmod":693,"advmod:loc":522,"neg":69,"advcl:loc":39,"acl":39},"4":{"ROOT":34525}}οΏ½cfgοΏ½οΏ½neg_keyοΏ½
senter/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+
3
+ }
senter/model ADDED
Binary file (213 kB). View file
 
tagger/cfg ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ "AD",
4
+ "AS",
5
+ "BA",
6
+ "CC",
7
+ "CD",
8
+ "CS",
9
+ "DEC",
10
+ "DEG",
11
+ "DER",
12
+ "DEV",
13
+ "DT",
14
+ "ETC",
15
+ "FW",
16
+ "IJ",
17
+ "INF",
18
+ "JJ",
19
+ "LB",
20
+ "LC",
21
+ "M",
22
+ "MSP",
23
+ "NN",
24
+ "NR",
25
+ "NT",
26
+ "OD",
27
+ "ON",
28
+ "P",
29
+ "PN",
30
+ "PU",
31
+ "SB",
32
+ "SP",
33
+ "URL",
34
+ "VA",
35
+ "VC",
36
+ "VE",
37
+ "VV",
38
+ "X"
39
+ ]
40
+ }
tagger/model ADDED
Binary file (14.3 kB). View file
 
tok2vec/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+
3
+ }
tok2vec/model ADDED
Binary file (6.81 MB). View file
 
tokenizer/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "segmenter":"pkuseg"
3
+ }
tokenizer/pkuseg_model/features.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd4322482a7018b9bce9216173ae9d2848efe6d310b468bbb4383fb55c874a18
3
+ size 22685181
tokenizer/pkuseg_model/weights.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ada075eb25a854f71d6e6fa4e7d55e7be0ae049255b1f8f19d05c13b1b68c9e
3
+ size 37508754
tokenizer/pkuseg_processors ADDED
Binary file (4.53 MB). View file
 
vocab/key2row ADDED
Binary file (6.87 MB). View file
 
vocab/lookups.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76be8b528d0075f7aae98d6fa57a6d3c83ae480a8469e668d7b0af968995ac71
3
+ size 1
vocab/strings.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:401539f9b54cffa79ffd8de96bdd43f4a6caff75dbb63a9cb3655696190fcfb6
3
+ size 9845085
vocab/vectors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:761b22330b44dfde9c65f6646d02c785e3935b34410802e4fc9297ca3b5ba3f6
3
+ size 600000128
zh_core_web_lg-any-py3-none-any.whl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:292a92db6ef0ef5c60756e6de7bc98bb43fdf92655b6def5fb7558e2e8cd8474
3
+ size 603784210