File size: 4,235 Bytes
86c0f16
 
 
 
 
 
 
 
9eedfe9
 
86c0f16
 
 
 
 
 
 
 
 
 
 
 
 
 
6ec11cb
86c0f16
 
 
 
 
 
 
 
f9f5fb9
 
 
 
 
 
610bead
f9f5fb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86c0f16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: mit
datasets:
- ljvmiranda921/tlunified-ner
language:
- tl
metrics:
- f1
tags:
- gliner
pipeline_tag: token-classification
model-index:
- name: tl_gliner_small
  results:
  - task:
      type: token-classification
      name: Named Entity Recognition
    dataset:
      type: tlunified-ner
      name: TLUnified-NER
      split: test
      revision: 3f7dab9d232414ec6204f8d6934b9a35f90a254f
    metrics:
    - type: f1
      value: 0.854
      name: F1
---

# GLiNER (large) model finetuned on Tagalog data

This model was finetuned using the [GLiNER v2.5 suite](https://github.com/urchade/GLiNER) of models.
You can find and replicate the training pipeline on [Github](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0-gliner).

## Usage

```python
from gliner import GLiNER

# Initialize GLiNER with the base model
model = GLiNER.from_pretrained("ljvmiranda921/tl_gliner_large")

# Sample text for entity prediction
# Reference: Leni Robredo’s speech at the 2022 UP College of Law recognition rites
text = """"
Nagsimula ako sa Public Attorney’s Office, kung saan araw-araw, mula Lunes hanggang Biyernes, nasa loob ako ng iba’t ibang court room at tambak ang kaso.
Bawat Sabado, nasa BJMP ako para ihanda ang aking mga kliyente. Nahasa ako sa crim law at litigation. Pero kinalaunan, lumipat ako sa isang NGO,
‘yung Sentro ng Alternatibong Lingap Panligal. Sa SALIGAN talaga ako nahubog bilang abugado: imbes na tinatanggap na lang ang mga batas na kailangang
sundin, nagtatanong din kung ito ba ay tunay na instrumento para makapagbigay ng katarungan sa ordinaryong Pilipino. Imbes na maghintay ng mga kliyente
sa de-aircon na opisina, dinadayo namin ang mga malalayong komunidad. Kadalasan, naka-tsinelas, naka-t-shirt at maong, hinahanap namin ang mga komunidad,
tinatawid ang mga bundok, palayan, at mga ilog para tumungo sa mga lugar kung saan hirap ang mga batayang sektor na makakuha ng access to justice.
Naaalala ko pa noong naging lead lawyer ako para sa isang proyekto: sa loob ng mahigit dalawang taon, bumibiyahe ako buwan-buwan papunta sa malayong
isla ng Masbate, nagpa-paralegal training sa mga batayang sektor doon, ipinapaliwanag, itinituturo, at sinasanay sila sa mga batas na nagbibigay-proteksyon
sa mga karapatan nila.
"""

# Labels for entity prediction
# Most GLiNER models should work best when entity types are in lower case or title case
labels = ["person", "organization", "location"]

# Perform entity prediction
entities = model.predict_entities(text, labels, threshold=0.5)

# Display predicted entities and their labels
for entity in entities:
    print(entity["text"], "=>", entity["label"])

# Sample output:
# Public Attorney’s Office => organization
# BJMP => organization
# Sentro ng Alternatibong Lingap Panligal => organization
# Masbate => location

```


## Citation

Please cite the following papers when using these models:

```
@misc{zaratiana2023gliner,
    title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer}, 
    author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois},
    year={2023},
    eprint={2311.08526},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

```
@inproceedings{miranda-2023-calamancy,
  title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit",
  author = "Miranda, Lester James",
  booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
  month = dec,
  year = "2023",
  address = "Singapore, Singapore",
  publisher = "Empirical Methods in Natural Language Processing",
  url = "https://aclanthology.org/2023.nlposs-1.1",
  pages = "1--7",
} 
```

If you're using the NER dataset:

```
@inproceedings{miranda-2023-developing,
  title = "Developing a Named Entity Recognition Dataset for {T}agalog",
  author = "Miranda, Lester James",
  booktitle = "Proceedings of the First Workshop in South East Asian Language Processing",
  month = nov,
  year = "2023",
  address = "Nusa Dua, Bali, Indonesia",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.sealp-1.2",
  doi = "10.18653/v1/2023.sealp-1.2",
  pages = "13--20",
}
```