poccio commited on
Commit
22e4917
1 Parent(s): 90a51b6

initial commit

Browse files
Files changed (2) hide show
  1. app.py +26 -12
  2. requirements.txt +1 -1
app.py CHANGED
@@ -36,7 +36,10 @@ def main(
36
  )
37
 
38
  # setup header
39
- st.markdown("<h1 style='text-align: center;'>ExtEnD: Extractive Entity Disambiguation</h1>", unsafe_allow_html=True)
 
 
 
40
  st.write(
41
  """
42
  <div align="center">
@@ -52,7 +55,6 @@ def main(
52
  )
53
 
54
  def model_demo():
55
-
56
  @st.cache(allow_output_mutation=True)
57
  def load_resources(inventory_path):
58
 
@@ -83,7 +85,9 @@ def main(
83
  # custom inventory
84
  uploaded_inventory_path = st.file_uploader(
85
  "[Optional] Upload custom inventory (tsv file, mention \\t desc1 \\t desc2 \\t)",
86
- accept_multiple_files=False, type=["tsv"])
 
 
87
  if uploaded_inventory_path is not None:
88
  inventory_path = f"data/inventories/{uploaded_inventory_path.name}"
89
  with open(inventory_path, "wb") as f:
@@ -91,11 +95,11 @@ def main(
91
  else:
92
  inventory_path = default_inventory_path
93
 
94
- if st.button("Classify", key="classify"):
 
 
95
 
96
- # load model
97
- nlp = load_resources(inventory_path)
98
- color_generator = get_md_200_random_color_generator()
99
 
100
  # tag sentence
101
  time_start = time.perf_counter()
@@ -149,8 +153,11 @@ def main(
149
 
150
  def hiw():
151
  st.markdown("ExtEnD frames Entity Disambiguation as a text extraction problem:")
152
- st.image("data/repo-assets/extend_formulation.png", caption="ExtEnD Formulation")
153
- st.markdown("""
 
 
 
154
  Given the sentence *After a long fight Superman saved Metropolis*, where *Superman* is the mention
155
  to disambiguate, ExtEnD first concatenates the descriptions of all the possible candidates of *Superman* in the
156
  inventory and then selects the span whose description best suits the mention in its context.
@@ -158,16 +165,23 @@ def main(
158
  To convert this task to end2end entity linking, as we do in *Model demo*, we leverage spaCy
159
  (more specifically, its NER) and run ExtEnD on each named entity spaCy identifies
160
  (if the corresponding mention is contained in the inventory).
161
- """)
 
162
 
163
  def abstract():
164
  st.write(
165
  """
166
- Word Sense Disambiguation (WSD) is a historical NLP task aimed at linking words in contexts to discrete sense inventories and it is usually cast as a multi-label classification task. Recently, several neural approaches have employed sense definitions to better represent word meanings. Yet, these approaches do not observe the input sentence and the sense definition candidates all at once, thus potentially reducing the model performance and generalization power. We cope with this issue by reframing WSD as a span extraction problem --- which we called Extractive Sense Comprehension (ESC) --- and propose ESCHER, a transformer-based neural architecture for this new formulation. By means of an extensive array of experiments, we show that ESC unleashes the full potential of our model, leading it to outdo all of its competitors and to set a new state of the art on the English WSD task. In the few-shot scenario, ESCHER proves to exploit training data efficiently, attaining the same performance as its closest competitor while relying on almost three times fewer annotations. Furthermore, ESCHER can nimbly combine data annotated with senses from different lexical resources, achieving performances that were previously out of everyone's reach. The model along with data is available at https://github.com/SapienzaNLP/esc.
 
 
167
  """
168
  )
169
 
170
- tabs = dict(model=("Model demo", model_demo), hiw=("How it works", hiw), abstract=("Abstract", abstract))
 
 
 
 
171
 
172
  tabbed_navigation(tabs, "model")
173
 
 
36
  )
37
 
38
  # setup header
39
+ st.markdown(
40
+ "<h1 style='text-align: center;'>ExtEnD: Extractive Entity Disambiguation</h1>",
41
+ unsafe_allow_html=True,
42
+ )
43
  st.write(
44
  """
45
  <div align="center">
 
55
  )
56
 
57
  def model_demo():
 
58
  @st.cache(allow_output_mutation=True)
59
  def load_resources(inventory_path):
60
 
 
85
  # custom inventory
86
  uploaded_inventory_path = st.file_uploader(
87
  "[Optional] Upload custom inventory (tsv file, mention \\t desc1 \\t desc2 \\t)",
88
+ accept_multiple_files=False,
89
+ type=["tsv"],
90
+ )
91
  if uploaded_inventory_path is not None:
92
  inventory_path = f"data/inventories/{uploaded_inventory_path.name}"
93
  with open(inventory_path, "wb") as f:
 
95
  else:
96
  inventory_path = default_inventory_path
97
 
98
+ # load model and color generator
99
+ nlp = load_resources(inventory_path)
100
+ color_generator = get_md_200_random_color_generator()
101
 
102
+ if st.button("Disambiguate", key="classify"):
 
 
103
 
104
  # tag sentence
105
  time_start = time.perf_counter()
 
153
 
154
  def hiw():
155
  st.markdown("ExtEnD frames Entity Disambiguation as a text extraction problem:")
156
+ st.image(
157
+ "data/repo-assets/extend_formulation.png", caption="ExtEnD Formulation"
158
+ )
159
+ st.markdown(
160
+ """
161
  Given the sentence *After a long fight Superman saved Metropolis*, where *Superman* is the mention
162
  to disambiguate, ExtEnD first concatenates the descriptions of all the possible candidates of *Superman* in the
163
  inventory and then selects the span whose description best suits the mention in its context.
 
165
  To convert this task to end2end entity linking, as we do in *Model demo*, we leverage spaCy
166
  (more specifically, its NER) and run ExtEnD on each named entity spaCy identifies
167
  (if the corresponding mention is contained in the inventory).
168
+ """
169
+ )
170
 
171
  def abstract():
172
  st.write(
173
  """
174
+ Local models for Entity Disambiguation (ED) have today become extremely powerful, in most part thanks to the advent of large pre-trained language models. However, despite their significant performance achievements, most of these approaches frame ED through classification formulations that have intrinsic limitations, both computationally and from a modeling perspective. In contrast with this trend, here we propose EXTEND, a novel local formulation for ED where we frame this task as a text extraction problem, and present two Transformer-based architectures that implement it. Based on experiments in and out of domain, and training over two different data regimes, we find our approach surpasses all its competitors in terms of both data efficiency and raw performance. EXTEND outperforms its alternatives by as few as 6 F 1 points on the more constrained of the two data regimes and, when moving to the other higher-resourced regime, sets a new state of the art on 4 out of 6 benchmarks under consideration, with average improvements of 0.7 F 1 points overall and 1.1 F 1 points out of domain. In addition, to gain better insights from our results, we also perform a fine-grained evaluation of our performances on different classes of label frequency, along with an ablation study of our architectural choices and an error analysis. We release our code and models for research purposes at https:// github.com/SapienzaNLP/extend.
175
+
176
+ Link to full paper: https://www.researchgate.net/publication/359392427_ExtEnD_Extractive_Entity_Disambiguation
177
  """
178
  )
179
 
180
+ tabs = dict(
181
+ model=("Model demo", model_demo),
182
+ hiw=("How it works", hiw),
183
+ abstract=("Abstract", abstract),
184
+ )
185
 
186
  tabbed_navigation(tabs, "model")
187
 
requirements.txt CHANGED
@@ -1 +1 @@
1
- git+git://github.com/sapienzanlp/extend
 
1
+ git+https://github.com/sapienzanlp/extend@main