File size: 5,654 Bytes
fa2d9a8 d5e5b4f fa2d9a8 d5e5b4f fa2d9a8 ea19492 fa2d9a8 6da6af7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
license:
- cc-by-nc-sa-4.0
source_datasets:
- original
task_ids:
- word-sense-disambiguation
pretty_name: word-sense-linking-dataset
tags:
- word-sense-linking
- word-sense-disambiguation
- lexical-semantics
size_categories:
- 10K<n<100K
extra_gated_fields:
Email: text
Company: text
Country: country
I want to use this dataset for:
type: select
options:
- Research
- Education
- label: Other
value: other
I agree to use this dataset for non-commercial use ONLY: checkbox
extra_gated_heading: >-
Acknowledge our [Creative Commons Attribution-NonCommercial-ShareAlike 4.0
International License (CC BY-NC-SA
4.0)](https://github.com/Babelscape/WSL/wsl_data_license.txt) to access the
repository
extra_gated_description: Our team may take 2-3 days to process your request
extra_gated_button_content: Acknowledge license
language:
- en
---
---
# Word Sense Linking: Disambiguating Outside the Sandbox
[![Conference](http://img.shields.io/badge/ACL-2024-4b44ce.svg)](https://2024.aclweb.org/)
[![Paper](http://img.shields.io/badge/paper-ACL--anthology-B31B1B.svg)](https://aclanthology.org/2024.findings-acl.851/)
[![Hugging Face Collection](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-FCD21D)](https://huggingface.co/collections/Babelscape/word-sense-linking-66ace2182bc45680964cefcb)
[![GitHub](https://img.shields.io/badge/GitHub-grey?logo=github&link=https%3A%2F%2Fgithub.com%2FBabelscape%2FWSL)](https://github.com/Babelscape/WSL)
## Model Description
The Word Sense Linking model is designed to identify and disambiguate spans of text to their most suitable senses from a reference inventory. The annotations are provided as sense keys from WordNet, a large lexical database of English.
## Installation
Installation from PyPI:
```bash
git clone https://github.com/Babelscape/WSL
cd WSL
pip install -r requirements.txt
```
## Usage
WSL is composed of two main components: a retriever and a reader.
The retriever is responsible for retrieving relevant senses from a senses inventory (e.g WordNet),
while the reader is responsible for extracting spans from the input text and link them to the retrieved documents.
WSL can be used with the `from_pretrained` method to load a pre-trained pipeline.
```python
from wsl import WSL
from wsl.inference.data.objects import WSLOutput
wsl_model = WSL.from_pretrained("Babelscape/wsl-base")
relik_out: WSLOutput = wsl_model("Bus drivers drive busses for a living.")
```
WSLOutput(
text='Bus drivers drive busses for a living.',
tokens=['Bus', 'drivers', 'drive', 'busses', 'for', 'a', 'living', '.'],
id=0,
spans=[
Span(start=0, end=11, label='bus driver: someone who drives a bus', text='Bus drivers'),
Span(start=12, end=17, label='drive: operate or control a vehicle', text='drive'),
Span(start=18, end=24, label='bus: a vehicle carrying many passengers; used for public transport', text='busses'),
Span(start=31, end=37, label='living: the financial means whereby one lives', text='living')
],
candidates=Candidates(
candidates=[
{"text": "bus driver: someone who drives a bus", "id": "bus_driver%1:18:00::", "metadata": {}},
{"text": "driver: the operator of a motor vehicle", "id": "driver%1:18:00::", "metadata": {}},
{"text": "driver: someone who drives animals that pull a vehicle", "id": "driver%1:18:02::", "metadata": {}},
{"text": "bus: a vehicle carrying many passengers; used for public transport", "id": "bus%1:06:00::", "metadata": {}},
{"text": "living: the financial means whereby one lives", "id": "living%1:26:00::", "metadata": {}}
]
),
)
## Model Performance
Here you can find the performances of our model on the [WSL evaluation dataset](https://huggingface.co/datasets/Babelscape/wsl).
### Validation (SE07)
| Models | P | R | F1 |
|--------------|------|--------|--------|
| BEM_SUP | 67.6 | 40.9 | 51.0 |
| BEM_HEU | 70.8 | 51.2 | 59.4 |
| ConSeC_SUP | 76.4 | 46.5 | 57.8 |
| ConSeC_HEU | **76.7** | 55.4 | 64.3 |
| **Our Model**| 73.8 | **74.9** | **74.4** |
### Test (ALL_FULL)
| Models | P | R | F1 |
|--------------|------|--------|--------|
| BEM_SUP | 74.8 | 50.7 | 60.4 |
| BEM_HEU | 76.6 | 61.2 | 68.0 |
| ConSeC_SUP | 78.9 | 53.1 | 63.5 |
| ConSeC_HEU | **80.4** | 64.3 | 71.5 |
| **Our Model**| 75.2 | **76.7** | **75.9** |
## Additional Information
**Licensing Information**: Contents of this repository are restricted to only non-commercial research purposes under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright of the dataset contents belongs to Babelscape.
## Citation Information
```bibtex
@inproceedings{bejgu-etal-2024-wsl,
title = "Word Sense Linking: Disambiguating Outside the Sandbox",
author = "Bejgu, Andrei Stefan and Barba, Edoardo and Procopio, Luigi and Fern{\'a}ndez-Castro, Alberte and Navigli, Roberto",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
}
```
**Contributions**: Thanks to [@andreim14](https://github.com/andreim14), [@edobobo](https://github.com/edobobo), [@poccio](https://github.com/poccio) and [@navigli](https://github.com/navigli) for adding this model. |