File size: 5,644 Bytes
bb8d953
 
 
536c691
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---

license: cc-by-nc-sa-4.0
---

---
license:
- cc-by-nc-sa-4.0
source_datasets:

- original

task_ids:
- word-sense-disambiguation
pretty_name: word-sense-linking-dataset

tags:

- word-sense-linking

- word-sense-disambiguation

- lexical-semantics

size_categories:
- 10K<n<100K

extra_gated_fields:

  Email: text

  Company: text

  Country: country

  I want to use this dataset for:

    type: select

    options: 

      - Research

      - Education

      - label: Other

        value: other

  I agree to use this dataset for non-commercial use ONLY: checkbox

extra_gated_heading: "Acknowledge our [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://github.com/Babelscape/WSL/wsl_data_license.txt) to access the repository"

extra_gated_description: "Our team may take 2-3 days to process your request"

extra_gated_button_content: "Acknowledge license"

---

---





# Word Sense Linking: Disambiguating Outside the Sandbox



[![Conference](http://img.shields.io/badge/ACL-2024-4b44ce.svg)](https://2024.aclweb.org/)

[![Paper](http://img.shields.io/badge/paper-ACL--anthology-B31B1B.svg)](https://aclanthology.org/)

[![Hugging Face Collection](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-FCD21D)](https://huggingface.co/collections/Babelscape/word-sense-linking-66ace2182bc45680964cefcb)



## Model Description



The Word Sense Linking model is designed to identify and disambiguate spans of text to their most suitable senses from a reference inventory. The annotations are provided as sense keys from WordNet, a large lexical database of English.



## Installation



Installation from PyPI:



```bash

git clone https://github.com/Babelscape/WSL

cd WSL

pip install -r requirements.txt

```







## Usage



WSL is composed of two main components: a retriever and a reader.

The retriever is responsible for retrieving relevant senses from a senses inventory (e.g WordNet),

while the reader is responsible for extracting spans from the input text and link them to the retrieved documents.

WSL can be used with the `from_pretrained` method to load a pre-trained pipeline.



```python

from wsl import WSL

from wsl.inference.data.objects import WSLOutput



wsl_model = WSL.from_pretrained("Babelscape/wsl-base")

relik_out: WSLOutput = wsl_model("Bus drivers drive busses for a living.")

```



    WSLOutput(

    text='Bus drivers drive busses for a living.',

    tokens=['Bus', 'drivers', 'drive', 'busses', 'for', 'a', 'living', '.'],

    id=0,

    spans=[

        Span(start=0, end=11, label='bus driver: someone who drives a bus', text='Bus drivers'),

        Span(start=12, end=17, label='drive: operate or control a vehicle', text='drive'),

        Span(start=18, end=24, label='bus: a vehicle carrying many passengers; used for public transport', text='busses'),

        Span(start=31, end=37, label='living: the financial means whereby one lives', text='living')

    ],

    candidates=Candidates(

        candidates=[

                    {"text": "bus driver: someone who drives a bus", "id": "bus_driver%1:18:00::", "metadata": {}},

                    {"text": "driver: the operator of a motor vehicle", "id": "driver%1:18:00::", "metadata": {}},

                    {"text": "driver: someone who drives animals that pull a vehicle", "id": "driver%1:18:02::", "metadata": {}},

                    {"text": "bus: a vehicle carrying many passengers; used for public transport", "id": "bus%1:06:00::", "metadata": {}},

                    {"text": "living: the financial means whereby one lives", "id": "living%1:26:00::", "metadata": {}}

        ]

    ),

)







## Model Performance



Here you can find the performances of our model on the [WSL evaluation dataset](https://huggingface.co/datasets/Babelscape/wsl).



### Validation (SE07)



| Models       | P    | R      | F1     |

|--------------|------|--------|--------|

| BEM_SUP      | 67.6 | 40.9   | 51.0   |

| BEM_HEU      | 70.8 | 51.2   | 59.4   |

| ConSeC_SUP   | 76.4 | 46.5   | 57.8   |

| ConSeC_HEU   | **76.7** | 55.4   | 64.3   |

| **Our Model**| 73.8 | **74.9** | **74.4** |



### Test (ALL_FULL)



| Models       | P    | R      | F1     |

|--------------|------|--------|--------|

| BEM_SUP      | 74.8 | 50.7   | 60.4   |

| BEM_HEU      | 76.6 | 61.2   | 68.0   |

| ConSeC_SUP   | 78.9 | 53.1   | 63.5   |

| ConSeC_HEU   | **80.4** | 64.3   | 71.5   |

| **Our Model**| 75.2 | **76.7** | **75.9** |







## Additional Information

**Licensing Information**: Contents of this repository are restricted to only non-commercial research purposes under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright of the dataset contents belongs to Babelscape.



## Citation Information





```bibtex

@inproceedings{bejgu-etal-2024-wsl,

    title     = "Word Sense Linking: Disambiguating Outside the Sandbox",

    author    = "Bejgu, Andrei Stefan and Barba, Edoardo and Procopio, Luigi and Fern{\'a}ndez-Castro, Alberte and Navigli, Roberto",

    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",

    month     = aug,

    year      = "2024",

    address   = "Bangkok, Thailand",

    publisher = "Association for Computational Linguistics",

}

```

 

**Contributions**: Thanks to [@andreim14](https://github.com/andreim14), [@edobobo](https://github.com/edobobo), [@poccio](https://github.com/poccio) and [@navigli](https://github.com/navigli) for adding this model.