lxyuan
/

span-marker-bert-base-multilingual-uncased-multinerd

@@ -1,9 +1,49 @@
 ---
 tags:
 - generated_from_trainer
 model-index:
 - name: span-marker-bert-base-multilingual-uncased-multinerd
-  results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -11,7 +51,7 @@ should probably proofread and complete it, then remove this comment. -->
 # span-marker-bert-base-multilingual-uncased-multinerd
-This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0054
 - Overall Precision: 0.9275
@@ -19,20 +59,244 @@ It achieves the following results on the evaluation set:
 - Overall F1: 0.9210
 - Overall Accuracy: 0.9842
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -61,4 +325,4 @@ The following hyperparameters were used during training:
 - Transformers 4.30.2
 - Pytorch 2.0.1+cu117
 - Datasets 2.14.3
-- Tokenizers 0.13.3

 ---
 tags:
 - generated_from_trainer
+- ner
+- named-entity-recognition
+- span-marker
 model-index:
 - name: span-marker-bert-base-multilingual-uncased-multinerd
+  results:
+  - task:
+      type: token-classification
+      name: Named Entity Recognition
+    dataset:
+      type: Babelscape/multinerd
+      name: MultiNERD
+      split: test
+      revision: 2814b78e7af4b5a1f1886fe7ad49632de4d9dd25
+    metrics:
+    - type: f1
+      value: 0.9187
+      name: F1
+    - type: precision
+      value: 0.9202
+      name: Precision
+    - type: recall
+      value: 0.9172
+      name: Recall
+license: apache-2.0
+datasets:
+- Babelscape/multinerd
+metrics:
+- precision
+- recall
+- f1
+pipeline_tag: token-classification
+language:
+- de
+- en
+- es
+- fr
+- it
+- nl
+- pl
+- pt
+- ru
+- zh
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # span-marker-bert-base-multilingual-uncased-multinerd
+This model is a fine-tuned version of [bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased) on an [Babelscape/multinerd](https://huggingface.co/datasets/Babelscape/multinerd) dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0054
 - Overall Precision: 0.9275
 - Overall F1: 0.9210
 - Overall Accuracy: 0.9842
+Test set results:
+- test_loss: 0.0058621917851269245,
+- test_overall_accuracy: 0.9831472809849865,
+- test_overall_f1: 0.9187844693592546,
+- test_overall_precision: 0.9202802342397876,
+- test_overall_recall: 0.9172935588307115,
+- test_runtime: 2716.7472,
+- test_samples_per_second: 149.141,
+- test_steps_per_second: 4.661,
+Note:
+This is a replication of Tom's work. In this work, we used slightly different hyperparameters: `epochs=3` and `gradient_accumulation_steps=2`.
+We also switched to the uncased [bert model](https://huggingface.co/bert-base-multilingual-uncased) to see if an uncased encoder model would perform better for commonly lowercased entities like, such as food. Please check the discussion [here](https://huggingface.co/lxyuan/span-marker-bert-base-multilingual-cased-multinerd/discussions/1).
+Refer to the official [model page](https://huggingface.co/tomaarsen/span-marker-mbert-base-multinerd) to review their results and training script.
+## Results:
+| **Language** | **Precision** | **Recall** | **F1**    |
+|--------------|---------------|------------|-----------|
+| **all**      | 92.03         | 91.73      | **91.88** |
+| **de**       | 94.96         | 94.87      | **94.91** |
+| **en**       | 93.69         | 93.75      | **93.72** |
+| **es**       | 91.19         | 90.69      | **90.94** |
+| **fr**       | 91.36         | 90.74      | **91.05** |
+| **it**       | 90.51         | 92.57      | **91.53** |
+| **nl**       | 93.23         | 92.13      | **92.67** |
+| **pl**       | 92.17         | 91.59      | **91.88** |
+| **pt**       | 92.70         | 91.59      | **92.14** |
+| **ru**       | 92.31         | 92.36      | **92.34** |
+| **zh**       | 88.91         | 87.53      | **88.22** |
+Below is a combined table that compares the results of the cased and uncased models for each language:
+| **Language** | **Metric**   | **Cased** | **Uncased** |
+|--------------|--------------|-----------|-------------|
+| **all**      | Precision    | 92.42     | 92.03       |
+|              | Recall       | 92.81     | 91.73       |
+|              | F1           | **92.61** | 91.88       |
+| **de**       | Precision    | 95.03     | 94.96       |
+|              | Recall       | 95.07     | 94.87       |
+|              | F1           | **95.05** | 94.91       |
+| **en**       | Precision    | 95.00     | 93.69       |
+|              | Recall       | 95.40     | 93.75       |
+|              | F1           | **95.20** | 93.72       |
+| **es**       | Precision    | 92.05     | 91.19       |
+|              | Recall       | 91.37     | 90.69       |
+|              | F1           | **91.71** | 90.94       |
+| **fr**       | Precision    | 92.37     | 91.36       |
+|              | Recall       | 91.41     | 90.74       |
+|              | F1           | **91.89** | 91.05       |
+| **it**       | Precision    | 91.45     | 90.51       |
+|              | Recall       | 93.15     | 92.57       |
+|              | F1           | **92.29** | 91.53       |
+| **nl**       | Precision    | 93.85     | 93.23       |
+|              | Recall       | 92.98     | 92.13       |
+|              | F1           | **93.41** | 92.67       |
+| **pl**       | Precision    | 93.13     | 92.17       |
+|              | Recall       | 92.66     | 91.59       |
+|              | F1           | **92.89** | 91.88       |
+| **pt**       | Precision    | 93.60     | 92.70       |
+|              | Recall       | 92.50     | 91.59       |
+|              | F1           | **93.05** | 92.14       |
+| **ru**       | Precision    | 93.25     | 92.31       |
+|              | Recall       | 93.32     | 92.36       |
+|              | F1           | **93.29** | 92.34       |
+| **zh**       | Precision    | 89.47     | 88.91       |
+|              | Recall       | 88.40     | 87.53       |
+|              | F1           | **88.93** | 88.22       |
+Short discussion:
+Upon examining the results, one might conclude that the cased version of the model is better than the uncased version,
+as it outperforms the latter across all languages. However, I recommend that users test both models on their specific
+datasets (or domains) to determine which one actually delivers better performance. My reasoning for this suggestion
+stems from a brief comparison I conducted on the FOOD (food) entities. I found that both cased and uncased models are
+sensitive to the full stop punctuation mark. We direct readers to the section: Quick Comparison on FOOD Entities.
+## Label set
+| Class | Description | Examples |
+|-------|-------------|----------|
+| **PER (person)** | People | Ray Charles, Jessica Alba, Leonardo DiCaprio, Roger Federer, Anna Massey. |
+| **ORG (organization)** | Associations, companies, agencies, institutions, nationalities and religious or political groups | University of Edinburgh, San Francisco Giants, Google, Democratic Party. |
+| **LOC (location)** | Physical locations (e.g. mountains, bodies of water), geopolitical entities (e.g. cities, states), and facilities (e.g. bridges, buildings, airports). | Rome, Lake Paiku, Chrysler Building, Mount Rushmore, Mississippi River. |
+| **ANIM (animal)** | Breeds of dogs, cats and other animals, including their scientific names. | Maine Coon, African Wild Dog, Great White Shark, New Zealand Bellbird. |
+| **BIO (biological)** | Genus of fungus, bacteria and protoctists, families of viruses, and other biological entities. | Herpes Simplex Virus, Escherichia Coli, Salmonella, Bacillus Anthracis. |
+| **CEL (celestial)** | Planets, stars, asteroids, comets, nebulae, galaxies and other astronomical objects. | Sun, Neptune, Asteroid 187 Lamberta, Proxima Centauri, V838 Monocerotis. |
+| **DIS (disease)** | Physical, mental, infectious, non-infectious, deficiency, inherited, degenerative, social and self-inflicted diseases. | Alzheimer’s Disease, Cystic Fibrosis, Dilated Cardiomyopathy, Arthritis. |
+| **EVE (event)** | Sport events, battles, wars and other events. | American Civil War, 2003 Wimbledon Championships, Cannes Film Festival. |
+| **FOOD (food)** | Foods and drinks. | Carbonara, Sangiovese, Cheddar Beer Fondue, Pizza Margherita. |
+| **INST (instrument)** | Technological instruments, mechanical instruments, musical instruments, and other tools. | Spitzer Space Telescope, Commodore 64, Skype, Apple Watch, Fender Stratocaster. |
+| **MEDIA (media)** | Titles of films, books, magazines, songs and albums, fictional characters and languages. | Forbes, American Psycho, Kiss Me Once, Twin Peaks, Disney Adventures. |
+| **PLANT (plant)** | Types of trees, flowers, and other plants, including their scientific names. | Salix, Quercus Petraea, Douglas Fir, Forsythia, Artemisia Maritima. |
+| **MYTH (mythological)** | Mythological and religious entities. | Apollo, Persephone, Aphrodite, Saint Peter, Pope Gregory I, Hercules. |
+| **TIME (time)** | Specific and well-defined time intervals, such as eras, historical periods, centuries, years and important days. No months and days of the week. | Renaissance, Middle Ages, Christmas, Great Depression, 17th Century, 2012. |
+| **VEHI (vehicle)** | Cars, motorcycles and other vehicles. | Ferrari Testarossa, Suzuki Jimny, Honda CR-X, Boeing 747, Fairey Fulmar. |
+## Inference Example
+```python
+# install span_marker
+(env)$ pip install span_marker
+from span_marker import SpanMarkerModel
+model = SpanMarkerModel.from_pretrained("lxyuan/span-marker-bert-base-multilingual-uncased-multinerd")
+description = "Singapore is renowned for its hawker centers offering dishes \
+like Hainanese chicken rice and laksa, while Malaysia boasts dishes such as \
+nasi lemak and rendang, reflecting its rich culinary heritage."
+entities = model.predict(description)
+entities
+>>>
+[
+  {'span': 'Singapore', 'label': 'LOC', 'score': 0.9999247789382935, 'char_start_index': 0, 'char_end_index': 9},
+  {'span': 'laksa', 'label': 'FOOD', 'score': 0.794235348701477, 'char_start_index': 93, 'char_end_index': 98},
+  {'span': 'Malaysia', 'label': 'LOC', 'score': 0.9999157190322876, 'char_start_index': 106, 'char_end_index': 114}
+]
+# missed: Hainanese chicken rice as FOOD
+# missed: nasi lemak as FOOD
+# missed: rendang as FOOD
+# note: Unfortunately, this uncased version still fails to pick up those commonly lowercased food entities and even misses out on the capitalized `Hainanese chicken rice` entity.
+```
+#### Quick test on Chinese
+```python
+from span_marker import SpanMarkerModel
+model = SpanMarkerModel.from_pretrained("lxyuan/span-marker-bert-base-multilingual-uncased-multinerd")
+# translate to chinese
+description = "Singapore is renowned for its hawker centers offering dishes \
+like Hainanese chicken rice and laksa, while Malaysia boasts dishes such as \
+nasi lemak and rendang, reflecting its rich culinary heritage."
+zh_description = "新加坡因其小贩中心提供海南鸡饭和叻沙等菜肴而闻名, 而马来西亚则拥有椰浆饭和仁当等菜肴，反映了其丰富的烹饪传统."
+entities = model.predict(zh_description)
+entities
+>>>
+[
+  {'span': '新加坡', 'label': 'LOC', 'score': 0.8477746248245239, 'char_start_index': 0, 'char_end_index': 3},
+  {'span': '马来西亚', 'label': 'LOC', 'score': 0.7525337934494019, 'char_start_index': 27, 'char_end_index': 31}
+]
+# It only managed to capture two countries: Singapore and Malaysia.
+# All other entities were missed out.
+# Same prediction as the [uncased model](https://huggingface.co/lxyuan/span-marker-bert-base-multilingual-cased-multinerd)
+```
+### Quick Comparison on FOOD Entities
+In this quick comparison, we found that a full stop punctuation mark seems to help the uncased model identify food entities,
+regardless of whether they are capitalized or in uppercase. In contrast, the cased model doesn't respond well to full stops,
+and adding them would lower the prediction score.
+```python
+from span_marker import SpanMarkerModel
+cased_model = SpanMarkerModel.from_pretrained("lxyuan/span-marker-bert-base-multilingual-cased-multinerd")
+uncased_model = SpanMarkerModel.from_pretrained("lxyuan/span-marker-bert-base-multilingual-uncased-multinerd")
+# no full stop mark
+uncased_model.predict("i love fried chicken and korea bbq")
+>>> []
+uncased_model.predict("i love fried chicken and korea BBQ") # Uppercase BBQ only
+>>> []
+uncased_model.predict("i love fried chicken and Korea BBQ") # Capitalize korea and uppercase BBQ
+>>> []
+# add full stop to get better result
+uncased_model.predict("i love fried chicken and korea bbq.")
+>>> [
+  {'span': 'fried chicken', 'label': 'FOOD', 'score': 0.6531468629837036, 'char_start_index': 7, 'char_end_index': 20},
+  {'span': 'korea bbq', 'label': 'FOOD', 'score': 0.9738698601722717, 'char_start_index': 25,'char_end_index': 34}
+]
+uncased_model.predict("i love fried chicken and korea BBQ.")
+>>> [
+  {'span': 'fried chicken', 'label': 'FOOD', 'score': 0.6531468629837036, 'char_start_index': 7, 'char_end_index': 20},
+  {'span': 'korea BBQ', 'label': 'FOOD', 'score': 0.9738698601722717, 'char_start_index': 25, 'char_end_index': 34}
+]
+uncased_model.predict("i love fried chicken and Korea BBQ.")
+>>> [
+  {'span': 'fried chicken', 'label': 'FOOD', 'score': 0.6531468629837036, 'char_start_index': 7, 'char_end_index': 20},
+  {'span': 'Korea BBQ', 'label': 'FOOD', 'score': 0.9738698601722717, 'char_start_index': 25, 'char_end_index': 34}
+]
+# no full stop mark
+cased_model.predict("i love fried chicken and korea bbq")
+>>> [
+  {'span': 'korea bbq', 'label': 'FOOD', 'score': 0.5054221749305725, 'char_start_index': 25, 'char_end_index': 34}
+]
+cased_model.predict("i love fried chicken and korea BBQ")
+>>> [
+  {'span': 'korea BBQ', 'label': 'FOOD', 'score': 0.6987857222557068, 'char_start_index': 25, 'char_end_index': 34}
+]
+cased_model.predict("i love fried chicken and Korea BBQ")
+>>> [
+  {'span': 'Korea BBQ', 'label': 'FOOD', 'score': 0.9755308032035828, 'char_start_index': 25, 'char_end_index': 34}
+]
+# add a fullstop mark hurt the cased model prediction score a little bit
+cased_model.predict("i love fried chicken and korea bbq.")
+>>> []
+cased_model.predict("i love fried chicken and korea BBQ.")
+>>> [
+  {'span': 'korea BBQ', 'label': 'FOOD', 'score': 0.5078140497207642, 'char_start_index': 25, 'char_end_index': 34}
+]
+cased_model.predict("i love fried chicken and Korea BBQ.")
+>>> [
+  {'span': 'Korea BBQ', 'label': 'FOOD', 'score': 0.895089328289032, 'char_start_index': 25, 'char_end_index': 34}
+]
+```
 ## Training procedure
+One can reproduce the result running this [script](https://huggingface.co/tomaarsen/span-marker-mbert-base-multinerd/blob/main/train.py)
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - Transformers 4.30.2
 - Pytorch 2.0.1+cu117
 - Datasets 2.14.3
+- Tokenizers 0.13.3