lxyuan commited on
Commit
811e27e
1 Parent(s): d6ffccb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +273 -9
README.md CHANGED
@@ -1,9 +1,49 @@
1
  ---
2
  tags:
3
  - generated_from_trainer
 
 
 
4
  model-index:
5
  - name: span-marker-bert-base-multilingual-uncased-multinerd
6
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -11,7 +51,7 @@ should probably proofread and complete it, then remove this comment. -->
11
 
12
  # span-marker-bert-base-multilingual-uncased-multinerd
13
 
14
- This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
15
  It achieves the following results on the evaluation set:
16
  - Loss: 0.0054
17
  - Overall Precision: 0.9275
@@ -19,20 +59,244 @@ It achieves the following results on the evaluation set:
19
  - Overall F1: 0.9210
20
  - Overall Accuracy: 0.9842
21
 
22
- ## Model description
23
 
24
- More information needed
 
 
 
 
 
 
 
25
 
26
- ## Intended uses & limitations
 
 
 
27
 
28
- More information needed
29
 
30
- ## Training and evaluation data
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ## Training procedure
35
 
 
 
 
36
  ### Training hyperparameters
37
 
38
  The following hyperparameters were used during training:
@@ -61,4 +325,4 @@ The following hyperparameters were used during training:
61
  - Transformers 4.30.2
62
  - Pytorch 2.0.1+cu117
63
  - Datasets 2.14.3
64
- - Tokenizers 0.13.3
 
1
  ---
2
  tags:
3
  - generated_from_trainer
4
+ - ner
5
+ - named-entity-recognition
6
+ - span-marker
7
  model-index:
8
  - name: span-marker-bert-base-multilingual-uncased-multinerd
9
+ results:
10
+ - task:
11
+ type: token-classification
12
+ name: Named Entity Recognition
13
+ dataset:
14
+ type: Babelscape/multinerd
15
+ name: MultiNERD
16
+ split: test
17
+ revision: 2814b78e7af4b5a1f1886fe7ad49632de4d9dd25
18
+ metrics:
19
+ - type: f1
20
+ value: 0.9187
21
+ name: F1
22
+ - type: precision
23
+ value: 0.9202
24
+ name: Precision
25
+ - type: recall
26
+ value: 0.9172
27
+ name: Recall
28
+ license: apache-2.0
29
+ datasets:
30
+ - Babelscape/multinerd
31
+ metrics:
32
+ - precision
33
+ - recall
34
+ - f1
35
+ pipeline_tag: token-classification
36
+ language:
37
+ - de
38
+ - en
39
+ - es
40
+ - fr
41
+ - it
42
+ - nl
43
+ - pl
44
+ - pt
45
+ - ru
46
+ - zh
47
  ---
48
 
49
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
51
 
52
  # span-marker-bert-base-multilingual-uncased-multinerd
53
 
54
+ This model is a fine-tuned version of [bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased) on an [Babelscape/multinerd](https://huggingface.co/datasets/Babelscape/multinerd) dataset.
55
  It achieves the following results on the evaluation set:
56
  - Loss: 0.0054
57
  - Overall Precision: 0.9275
 
59
  - Overall F1: 0.9210
60
  - Overall Accuracy: 0.9842
61
 
62
+ Test set results:
63
 
64
+ - test_loss: 0.0058621917851269245,
65
+ - test_overall_accuracy: 0.9831472809849865,
66
+ - test_overall_f1: 0.9187844693592546,
67
+ - test_overall_precision: 0.9202802342397876,
68
+ - test_overall_recall: 0.9172935588307115,
69
+ - test_runtime: 2716.7472,
70
+ - test_samples_per_second: 149.141,
71
+ - test_steps_per_second: 4.661,
72
 
73
+ Note:
74
+ This is a replication of Tom's work. In this work, we used slightly different hyperparameters: `epochs=3` and `gradient_accumulation_steps=2`.
75
+ We also switched to the uncased [bert model](https://huggingface.co/bert-base-multilingual-uncased) to see if an uncased encoder model would perform better for commonly lowercased entities like, such as food. Please check the discussion [here](https://huggingface.co/lxyuan/span-marker-bert-base-multilingual-cased-multinerd/discussions/1).
76
+ Refer to the official [model page](https://huggingface.co/tomaarsen/span-marker-mbert-base-multinerd) to review their results and training script.
77
 
78
+ ## Results:
79
 
80
+ | **Language** | **Precision** | **Recall** | **F1** |
81
+ |--------------|---------------|------------|-----------|
82
+ | **all** | 92.03 | 91.73 | **91.88** |
83
+ | **de** | 94.96 | 94.87 | **94.91** |
84
+ | **en** | 93.69 | 93.75 | **93.72** |
85
+ | **es** | 91.19 | 90.69 | **90.94** |
86
+ | **fr** | 91.36 | 90.74 | **91.05** |
87
+ | **it** | 90.51 | 92.57 | **91.53** |
88
+ | **nl** | 93.23 | 92.13 | **92.67** |
89
+ | **pl** | 92.17 | 91.59 | **91.88** |
90
+ | **pt** | 92.70 | 91.59 | **92.14** |
91
+ | **ru** | 92.31 | 92.36 | **92.34** |
92
+ | **zh** | 88.91 | 87.53 | **88.22** |
93
 
94
+ Below is a combined table that compares the results of the cased and uncased models for each language:
95
+
96
+ | **Language** | **Metric** | **Cased** | **Uncased** |
97
+ |--------------|--------------|-----------|-------------|
98
+ | **all** | Precision | 92.42 | 92.03 |
99
+ | | Recall | 92.81 | 91.73 |
100
+ | | F1 | **92.61** | 91.88 |
101
+ | **de** | Precision | 95.03 | 94.96 |
102
+ | | Recall | 95.07 | 94.87 |
103
+ | | F1 | **95.05** | 94.91 |
104
+ | **en** | Precision | 95.00 | 93.69 |
105
+ | | Recall | 95.40 | 93.75 |
106
+ | | F1 | **95.20** | 93.72 |
107
+ | **es** | Precision | 92.05 | 91.19 |
108
+ | | Recall | 91.37 | 90.69 |
109
+ | | F1 | **91.71** | 90.94 |
110
+ | **fr** | Precision | 92.37 | 91.36 |
111
+ | | Recall | 91.41 | 90.74 |
112
+ | | F1 | **91.89** | 91.05 |
113
+ | **it** | Precision | 91.45 | 90.51 |
114
+ | | Recall | 93.15 | 92.57 |
115
+ | | F1 | **92.29** | 91.53 |
116
+ | **nl** | Precision | 93.85 | 93.23 |
117
+ | | Recall | 92.98 | 92.13 |
118
+ | | F1 | **93.41** | 92.67 |
119
+ | **pl** | Precision | 93.13 | 92.17 |
120
+ | | Recall | 92.66 | 91.59 |
121
+ | | F1 | **92.89** | 91.88 |
122
+ | **pt** | Precision | 93.60 | 92.70 |
123
+ | | Recall | 92.50 | 91.59 |
124
+ | | F1 | **93.05** | 92.14 |
125
+ | **ru** | Precision | 93.25 | 92.31 |
126
+ | | Recall | 93.32 | 92.36 |
127
+ | | F1 | **93.29** | 92.34 |
128
+ | **zh** | Precision | 89.47 | 88.91 |
129
+ | | Recall | 88.40 | 87.53 |
130
+ | | F1 | **88.93** | 88.22 |
131
+
132
+ Short discussion:
133
+ Upon examining the results, one might conclude that the cased version of the model is better than the uncased version,
134
+ as it outperforms the latter across all languages. However, I recommend that users test both models on their specific
135
+ datasets (or domains) to determine which one actually delivers better performance. My reasoning for this suggestion
136
+ stems from a brief comparison I conducted on the FOOD (food) entities. I found that both cased and uncased models are
137
+ sensitive to the full stop punctuation mark. We direct readers to the section: Quick Comparison on FOOD Entities.
138
+
139
+
140
+ ## Label set
141
+
142
+ | Class | Description | Examples |
143
+ |-------|-------------|----------|
144
+ | **PER (person)** | People | Ray Charles, Jessica Alba, Leonardo DiCaprio, Roger Federer, Anna Massey. |
145
+ | **ORG (organization)** | Associations, companies, agencies, institutions, nationalities and religious or political groups | University of Edinburgh, San Francisco Giants, Google, Democratic Party. |
146
+ | **LOC (location)** | Physical locations (e.g. mountains, bodies of water), geopolitical entities (e.g. cities, states), and facilities (e.g. bridges, buildings, airports). | Rome, Lake Paiku, Chrysler Building, Mount Rushmore, Mississippi River. |
147
+ | **ANIM (animal)** | Breeds of dogs, cats and other animals, including their scientific names. | Maine Coon, African Wild Dog, Great White Shark, New Zealand Bellbird. |
148
+ | **BIO (biological)** | Genus of fungus, bacteria and protoctists, families of viruses, and other biological entities. | Herpes Simplex Virus, Escherichia Coli, Salmonella, Bacillus Anthracis. |
149
+ | **CEL (celestial)** | Planets, stars, asteroids, comets, nebulae, galaxies and other astronomical objects. | Sun, Neptune, Asteroid 187 Lamberta, Proxima Centauri, V838 Monocerotis. |
150
+ | **DIS (disease)** | Physical, mental, infectious, non-infectious, deficiency, inherited, degenerative, social and self-inflicted diseases. | Alzheimer’s Disease, Cystic Fibrosis, Dilated Cardiomyopathy, Arthritis. |
151
+ | **EVE (event)** | Sport events, battles, wars and other events. | American Civil War, 2003 Wimbledon Championships, Cannes Film Festival. |
152
+ | **FOOD (food)** | Foods and drinks. | Carbonara, Sangiovese, Cheddar Beer Fondue, Pizza Margherita. |
153
+ | **INST (instrument)** | Technological instruments, mechanical instruments, musical instruments, and other tools. | Spitzer Space Telescope, Commodore 64, Skype, Apple Watch, Fender Stratocaster. |
154
+ | **MEDIA (media)** | Titles of films, books, magazines, songs and albums, fictional characters and languages. | Forbes, American Psycho, Kiss Me Once, Twin Peaks, Disney Adventures. |
155
+ | **PLANT (plant)** | Types of trees, flowers, and other plants, including their scientific names. | Salix, Quercus Petraea, Douglas Fir, Forsythia, Artemisia Maritima. |
156
+ | **MYTH (mythological)** | Mythological and religious entities. | Apollo, Persephone, Aphrodite, Saint Peter, Pope Gregory I, Hercules. |
157
+ | **TIME (time)** | Specific and well-defined time intervals, such as eras, historical periods, centuries, years and important days. No months and days of the week. | Renaissance, Middle Ages, Christmas, Great Depression, 17th Century, 2012. |
158
+ | **VEHI (vehicle)** | Cars, motorcycles and other vehicles. | Ferrari Testarossa, Suzuki Jimny, Honda CR-X, Boeing 747, Fairey Fulmar. |
159
+
160
+
161
+ ## Inference Example
162
+
163
+ ```python
164
+ # install span_marker
165
+ (env)$ pip install span_marker
166
+
167
+
168
+ from span_marker import SpanMarkerModel
169
+
170
+ model = SpanMarkerModel.from_pretrained("lxyuan/span-marker-bert-base-multilingual-uncased-multinerd")
171
+
172
+ description = "Singapore is renowned for its hawker centers offering dishes \
173
+ like Hainanese chicken rice and laksa, while Malaysia boasts dishes such as \
174
+ nasi lemak and rendang, reflecting its rich culinary heritage."
175
+
176
+ entities = model.predict(description)
177
+
178
+ entities
179
+ >>>
180
+ [
181
+ {'span': 'Singapore', 'label': 'LOC', 'score': 0.9999247789382935, 'char_start_index': 0, 'char_end_index': 9},
182
+ {'span': 'laksa', 'label': 'FOOD', 'score': 0.794235348701477, 'char_start_index': 93, 'char_end_index': 98},
183
+ {'span': 'Malaysia', 'label': 'LOC', 'score': 0.9999157190322876, 'char_start_index': 106, 'char_end_index': 114}
184
+ ]
185
+
186
+ # missed: Hainanese chicken rice as FOOD
187
+ # missed: nasi lemak as FOOD
188
+ # missed: rendang as FOOD
189
+
190
+ # note: Unfortunately, this uncased version still fails to pick up those commonly lowercased food entities and even misses out on the capitalized `Hainanese chicken rice` entity.
191
+ ```
192
+
193
+ #### Quick test on Chinese
194
+ ```python
195
+ from span_marker import SpanMarkerModel
196
+
197
+ model = SpanMarkerModel.from_pretrained("lxyuan/span-marker-bert-base-multilingual-uncased-multinerd")
198
+
199
+ # translate to chinese
200
+ description = "Singapore is renowned for its hawker centers offering dishes \
201
+ like Hainanese chicken rice and laksa, while Malaysia boasts dishes such as \
202
+ nasi lemak and rendang, reflecting its rich culinary heritage."
203
+
204
+ zh_description = "新加坡因其小贩中心提供海南鸡饭和叻沙等菜肴而闻名, 而马来西亚则拥有椰浆饭和仁当等菜肴,反映了其丰富的烹饪传统."
205
+
206
+ entities = model.predict(zh_description)
207
+
208
+ entities
209
+ >>>
210
+ [
211
+ {'span': '新加坡', 'label': 'LOC', 'score': 0.8477746248245239, 'char_start_index': 0, 'char_end_index': 3},
212
+ {'span': '马来西亚', 'label': 'LOC', 'score': 0.7525337934494019, 'char_start_index': 27, 'char_end_index': 31}
213
+ ]
214
+
215
+ # It only managed to capture two countries: Singapore and Malaysia.
216
+ # All other entities were missed out.
217
+ # Same prediction as the [uncased model](https://huggingface.co/lxyuan/span-marker-bert-base-multilingual-cased-multinerd)
218
+ ```
219
+
220
+ ### Quick Comparison on FOOD Entities
221
+
222
+ In this quick comparison, we found that a full stop punctuation mark seems to help the uncased model identify food entities,
223
+ regardless of whether they are capitalized or in uppercase. In contrast, the cased model doesn't respond well to full stops,
224
+ and adding them would lower the prediction score.
225
+
226
+ ```python
227
+ from span_marker import SpanMarkerModel
228
+
229
+ cased_model = SpanMarkerModel.from_pretrained("lxyuan/span-marker-bert-base-multilingual-cased-multinerd")
230
+ uncased_model = SpanMarkerModel.from_pretrained("lxyuan/span-marker-bert-base-multilingual-uncased-multinerd")
231
+
232
+ # no full stop mark
233
+ uncased_model.predict("i love fried chicken and korea bbq")
234
+ >>> []
235
+
236
+ uncased_model.predict("i love fried chicken and korea BBQ") # Uppercase BBQ only
237
+ >>> []
238
+
239
+ uncased_model.predict("i love fried chicken and Korea BBQ") # Capitalize korea and uppercase BBQ
240
+ >>> []
241
+
242
+ # add full stop to get better result
243
+ uncased_model.predict("i love fried chicken and korea bbq.")
244
+ >>> [
245
+ {'span': 'fried chicken', 'label': 'FOOD', 'score': 0.6531468629837036, 'char_start_index': 7, 'char_end_index': 20},
246
+ {'span': 'korea bbq', 'label': 'FOOD', 'score': 0.9738698601722717, 'char_start_index': 25,'char_end_index': 34}
247
+ ]
248
+
249
+ uncased_model.predict("i love fried chicken and korea BBQ.")
250
+ >>> [
251
+ {'span': 'fried chicken', 'label': 'FOOD', 'score': 0.6531468629837036, 'char_start_index': 7, 'char_end_index': 20},
252
+ {'span': 'korea BBQ', 'label': 'FOOD', 'score': 0.9738698601722717, 'char_start_index': 25, 'char_end_index': 34}
253
+ ]
254
+
255
+ uncased_model.predict("i love fried chicken and Korea BBQ.")
256
+ >>> [
257
+ {'span': 'fried chicken', 'label': 'FOOD', 'score': 0.6531468629837036, 'char_start_index': 7, 'char_end_index': 20},
258
+ {'span': 'Korea BBQ', 'label': 'FOOD', 'score': 0.9738698601722717, 'char_start_index': 25, 'char_end_index': 34}
259
+ ]
260
+
261
+
262
+
263
+ # no full stop mark
264
+ cased_model.predict("i love fried chicken and korea bbq")
265
+ >>> [
266
+ {'span': 'korea bbq', 'label': 'FOOD', 'score': 0.5054221749305725, 'char_start_index': 25, 'char_end_index': 34}
267
+ ]
268
+
269
+ cased_model.predict("i love fried chicken and korea BBQ")
270
+ >>> [
271
+ {'span': 'korea BBQ', 'label': 'FOOD', 'score': 0.6987857222557068, 'char_start_index': 25, 'char_end_index': 34}
272
+ ]
273
+
274
+ cased_model.predict("i love fried chicken and Korea BBQ")
275
+ >>> [
276
+ {'span': 'Korea BBQ', 'label': 'FOOD', 'score': 0.9755308032035828, 'char_start_index': 25, 'char_end_index': 34}
277
+ ]
278
+
279
+ # add a fullstop mark hurt the cased model prediction score a little bit
280
+ cased_model.predict("i love fried chicken and korea bbq.")
281
+ >>> []
282
+
283
+ cased_model.predict("i love fried chicken and korea BBQ.")
284
+ >>> [
285
+ {'span': 'korea BBQ', 'label': 'FOOD', 'score': 0.5078140497207642, 'char_start_index': 25, 'char_end_index': 34}
286
+ ]
287
+
288
+ cased_model.predict("i love fried chicken and Korea BBQ.")
289
+ >>> [
290
+ {'span': 'Korea BBQ', 'label': 'FOOD', 'score': 0.895089328289032, 'char_start_index': 25, 'char_end_index': 34}
291
+ ]
292
+
293
+ ```
294
 
295
  ## Training procedure
296
 
297
+ One can reproduce the result running this [script](https://huggingface.co/tomaarsen/span-marker-mbert-base-multinerd/blob/main/train.py)
298
+
299
+
300
  ### Training hyperparameters
301
 
302
  The following hyperparameters were used during training:
 
325
  - Transformers 4.30.2
326
  - Pytorch 2.0.1+cu117
327
  - Datasets 2.14.3
328
+ - Tokenizers 0.13.3