File size: 2,131 Bytes
36151b0
 
d7f80ce
 
 
 
 
 
 
 
 
 
d176b74
d7f80ce
36151b0
aefef28
973f6da
d7f80ce
973f6da
3c33f7d
276c7a6
d7f80ce
aefef28
2369971
d176b74
 
 
 
 
 
 
 
f877f67
 
d176b74
 
 
973f6da
d176b74
f877f67
 
f972596
f877f67
d176b74
 
 
 
 
 
 
 
 
 
 
 
79aa539
d176b74
 
a939888
7db6daa
 
a939888
 
 
 
 
 
 
 
 
7db6daa
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: mit
datasets:
- numind/NuNER
library_name: gliner
language:
- en
pipeline_tag: token-classification
tags:
- entity recognition
- NER
- named entity recognition
- zero shot
- zero-shot
---

NuNerZero - is the family of Zero-Shot Entity Recognition models inspired by [GLiNER](https://huggingface.co/papers/2311.08526) and built with insights we gathered throughout our work on [NuNER](https://huggingface.co/collections/numind/nuner-token-classification-and-ner-backbones-65e1f6e14639e2a465af823b).

NuNerZero span is:
* a more powerful version of GLiNER-large-v2.1, surpassing it by **+4.5% on average**
* is trained on the **diverse dataset tailored for real-life use cases** - NuNER v2.0 dataset

<p align="center">
<img src="zero_shot_performance_span.png">
</p>

## Installation & Usage

```
!pip install gliner
```

**NuZero requires labels to be lower-cased**

```python
from gliner import GLiNER

model = GLiNER.from_pretrained("numind/NuNerZero_span")

# NuZero requires labels to be lower-cased!
labels = ["person", "award", "date", "competitions", "teams"]
labels [l.lower() for l in labels]

text = """

"""

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])
```

## Fine-tuning

A fine-tuning script can be found [here](https://colab.research.google.com/drive/1fu15tWCi0SiQBBelwB-dUZDZu0RVfx_a?usp=sharing).


## Citation
### This work
```bibtex
@misc{bogdanov2024nuner,
      title={NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data}, 
      author={Sergei Bogdanov and Alexandre Constantin and Timothée Bernard and Benoit Crabbé and Etienne Bernard},
      year={2024},
      eprint={2402.15343},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```
### Previous work
```bibtex
@misc{zaratiana2023gliner,
      title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer}, 
      author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
      year={2023},
      eprint={2311.08526},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```