Relik
English
relik-ie commited on
Commit
578492a
β€’
1 Parent(s): 283995a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +182 -3
README.md CHANGED
@@ -1,3 +1,182 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - relik
7
+ ---
8
+
9
+ <div align="center">
10
+ <img src="https://github.com/SapienzaNLP/relik/blob/main/relik.png?raw=true" height="150">
11
+ <img src="https://github.com/SapienzaNLP/relik/blob/main/Sapienza_Babelscape.png?raw=true" height="50">
12
+ </div>
13
+
14
+ <div align="center">
15
+ <h1>Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget</h1>
16
+ </div>
17
+
18
+ <div style="display:flex; justify-content: center; align-items: center; flex-direction: row;">
19
+ <a href="https://2024.aclweb.org/"><img src="http://img.shields.io/badge/ACL-2024-4b44ce.svg"></a> &nbsp; &nbsp;
20
+ <a href="https://aclanthology.org/"><img src="http://img.shields.io/badge/paper-ACL--anthology-B31B1B.svg"></a> &nbsp; &nbsp;
21
+ <a href="https://arxiv.org/abs/2408.00103"><img src="https://img.shields.io/badge/arXiv-b31b1b.svg"></a>
22
+ </div>
23
+ <div style="display:flex; justify-content: center; align-items: center; flex-direction: row;">
24
+ <a href="https://huggingface.co/collections/sapienzanlp/relik-retrieve-read-and-link-665d9e4a5c3ecba98c1bef19"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-FCD21D"></a> &nbsp; &nbsp;
25
+ <a href="https://github.com/SapienzaNLP/relik"><img src="https://img.shields.io/badge/GitHub-Repo-121013?logo=github&logoColor=white"></a> &nbsp; &nbsp;
26
+ <a href="https://github.com/SapienzaNLP/relik/releases"><img src="https://img.shields.io/github/v/release/SapienzaNLP/relik"></a>
27
+ </div>
28
+
29
+ This card is for a **closed Information Extraction** model trained with **Entity Linking** and **Relation Extraction** in three forward passes, two for the Retrievers (one per task), and one for the Reader. The relation predictions are Wikidata properties.
30
+
31
+ A blazing fast and lightweight Information Extraction model for **Entity Linking** and **Relation Extraction**.
32
+
33
+ ## πŸ› οΈ Installation
34
+
35
+ Installation from PyPI
36
+
37
+ ```bash
38
+ pip install relik
39
+ ```
40
+
41
+ <details>
42
+ <summary>Other installation options</summary>
43
+
44
+ #### Install with optional dependencies
45
+
46
+ Install with all the optional dependencies.
47
+
48
+ ```bash
49
+ pip install relik[all]
50
+ ```
51
+
52
+ Install with optional dependencies for training and evaluation.
53
+
54
+ ```bash
55
+ pip install relik[train]
56
+ ```
57
+
58
+ Install with optional dependencies for [FAISS](https://github.com/facebookresearch/faiss)
59
+
60
+ FAISS PyPI package is only available for CPU. For GPU, install it from source or use the conda package.
61
+
62
+ For CPU:
63
+
64
+ ```bash
65
+ pip install relik[faiss]
66
+ ```
67
+
68
+ For GPU:
69
+
70
+ ```bash
71
+ conda create -n relik python=3.10
72
+ conda activate relik
73
+
74
+ # install pytorch
75
+ conda install -y pytorch=2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
76
+
77
+ # GPU
78
+ conda install -y -c pytorch -c nvidia faiss-gpu=1.8.0
79
+ # or GPU with NVIDIA RAFT
80
+ conda install -y -c pytorch -c nvidia -c rapidsai -c conda-forge faiss-gpu-raft=1.8.0
81
+
82
+ pip install relik
83
+ ```
84
+
85
+ Install with optional dependencies for serving the models with
86
+ [FastAPI](https://fastapi.tiangolo.com/) and [Ray](https://docs.ray.io/en/latest/serve/quickstart.html).
87
+
88
+ ```bash
89
+ pip install relik[serve]
90
+ ```
91
+
92
+ #### Installation from source
93
+
94
+ ```bash
95
+ git clone https://github.com/SapienzaNLP/relik.git
96
+ cd relik
97
+ pip install -e .[all]
98
+ ```
99
+
100
+ </details>
101
+
102
+ ## πŸš€ Quick Start
103
+
104
+ [//]: # (Write a short description of the model and how to use it with the `from_pretrained` method.)
105
+
106
+ ReLiK is a lightweight and fast model for **Entity Linking** and **Relation Extraction**.
107
+ It is composed of two main components: a retriever and a reader.
108
+ The retriever is responsible for retrieving relevant documents from a large collection,
109
+ while the reader is responsible for extracting entities and relations from the retrieved documents.
110
+ ReLiK can be used with the `from_pretrained` method to load a pre-trained pipeline.
111
+
112
+ Here is an example of how to use ReLiK for **Entity Linking**:
113
+
114
+ ```python
115
+ from relik import Relik
116
+ from relik.inference.data.objects import RelikOutput
117
+
118
+ relik = Relik.from_pretrained("sapienzanlp/relik-entity-linking-large")
119
+ relik_out: RelikOutput = relik("Michael Jordan was one of the best players in the NBA.")
120
+ ```
121
+
122
+ RelikOutput(
123
+ text="Michael Jordan was one of the best players in the NBA.",
124
+ tokens=['Michael', 'Jordan', 'was', 'one', 'of', 'the', 'best', 'players', 'in', 'the', 'NBA', '.'],
125
+ id=0,
126
+ spans=[
127
+ Span(start=0, end=14, label="Michael Jordan", text="Michael Jordan"),
128
+ Span(start=50, end=53, label="National Basketball Association", text="NBA"),
129
+ ],
130
+ triples=[],
131
+ candidates=Candidates(
132
+ span=[
133
+ [
134
+ [
135
+ {"text": "Michael Jordan", "id": 4484083},
136
+ {"text": "National Basketball Association", "id": 5209815},
137
+ {"text": "Walter Jordan", "id": 2340190},
138
+ {"text": "Jordan", "id": 3486773},
139
+ {"text": "50 Greatest Players in NBA History", "id": 1742909},
140
+ ...
141
+ ]
142
+ ]
143
+ ]
144
+ ),
145
+ )
146
+
147
+ ## πŸ“Š Performance
148
+
149
+ We evaluate the performance of ReLiK on Entity Linking using [GERBIL](http://gerbil-qa.aksw.org/gerbil/). The following table shows the results (InKB Micro F1) of ReLiK Large and Base:
150
+
151
+ | Model | AIDA | MSNBC | Der | K50 | R128 | R500 | O15 | O16 | Tot | OOD | AIT (m:s) |
152
+ |------------------------------------------|------|-------|------|------|------|------|------|------|------|------|------------|
153
+ | GENRE | 83.7 | 73.7 | 54.1 | 60.7 | 46.7 | 40.3 | 56.1 | 50.0 | 58.2 | 54.5 | 38:00 |
154
+ | EntQA | 85.8 | 72.1 | 52.9 | 64.5 | **54.1** | 41.9 | 61.1 | 51.3 | 60.5 | 56.4 | 20:00 |
155
+ | [ReLiK<sub>Base<sub>](https://huggingface.co/sapienzanlp/relik-entity-linking-base) | 85.3 | 72.3 | 55.6 | 68.0 | 48.1 | 41.6 | 62.5 | 52.3 | 60.7 | 57.2 | 00:29 |
156
+ | ➑️ [ReLiK<sub>Large<sub>](https://huggingface.co/sapienzanlp/relik-entity-linking-large) | **86.4** | **75.0** | **56.3** | **72.8** | 51.7 | **43.0** | **65.1** | **57.2** | **63.4** | **60.2** | 01:46 |
157
+
158
+ Comparison systems' evaluation (InKB Micro F1) on the *in-domain* AIDA test set and *out-of-domain* MSNBC (MSN), Derczynski (Der), KORE50 (K50), N3-Reuters-128 (R128),
159
+ N3-RSS-500 (R500), OKE-15 (O15), and OKE-16 (O16) test sets. **Bold** indicates the best model.
160
+ GENRE uses mention dictionaries.
161
+ The AIT column shows the time in minutes and seconds (m:s) that the systems need to process the whole AIDA test set using an NVIDIA RTX 4090,
162
+ except for EntQA which does not fit in 24GB of RAM and for which an A100 is used.
163
+
164
+ ## πŸ€– Models
165
+
166
+ Models can be found on [πŸ€— Hugging Face](https://huggingface.co/collections/sapienzanlp/relik-retrieve-read-and-link-665d9e4a5c3ecba98c1bef19).
167
+
168
+ ## πŸ’½ Cite this work
169
+
170
+ If you use any part of this work, please consider citing the paper as follows:
171
+
172
+ ```bibtex
173
+ @inproceedings{orlando-etal-2024-relik,
174
+ title = "Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget",
175
+ author = "Orlando, Riccardo and Huguet Cabot, Pere-Llu{\'\i}s and Barba, Edoardo and Navigli, Roberto",
176
+ booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
177
+ month = aug,
178
+ year = "2024",
179
+ address = "Bangkok, Thailand",
180
+ publisher = "Association for Computational Linguistics",
181
+ }
182
+ ```