|
--- |
|
tags: |
|
- spacy |
|
- token-classification |
|
language: |
|
- tl |
|
license: mit |
|
datasets: |
|
- ljvmiranda921/tlunified-ner |
|
library_name: spacy |
|
pipeline_tag: token-classification |
|
model-index: |
|
- name: >- |
|
Medium-sized calamanCy pipeline by L.J. Miranda |
|
results: |
|
- task: |
|
type: token-classification |
|
name: Named Entity Recognition |
|
dataset: |
|
type: tlunified-ner |
|
name: TLUnified-NER |
|
split: test |
|
revision: 3f7dab9d232414ec6204f8d6934b9a35f90a254f |
|
metrics: |
|
- type: f1 |
|
value: 0.8767 |
|
name: F1 |
|
--- |
|
calamanCy: Tagalog NLP pipelines in spaCy |
|
|
|
Paper: arxiv.org/abs/2311.07171 |
|
|
|
| Feature | Description | |
|
| --- | --- | |
|
| **Name** | `tl_calamancy_md` | |
|
| **Version** | `0.1.0` | |
|
| **spaCy** | `>=3.5.0,<4.0.0` | |
|
| **Default Pipeline** | `tok2vec`, `tagger`, `morphologizer`, `parser`, `ner` | |
|
| **Components** | `tok2vec`, `tagger`, `morphologizer`, `parser`, `ner` | |
|
| **Vectors** | -1 keys, 50000 unique vectors (200 dimensions) | |
|
| **Sources** | [TLUnified dataset](https://aclanthology.org/2022.lrec-1.703/) (Jan Christian Blaise Cruz and Charibeth Cheng)<br />[UD_Tagalog-TRG](https://universaldependencies.org/treebanks/tl_trg/index.html) (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)<br />[UD_Tagalog-Ugnayan](https://universaldependencies.org/treebanks/tl_ugnayan/index.html) (Angelina Aquino) | |
|
| **License** | `MIT` | |
|
| **Author** | [Lester James V. Miranda](https://github.com/ljvmiranda921/calamanCy) | |
|
|
|
### Label Scheme |
|
|
|
<details> |
|
|
|
<summary>View label scheme (120 labels for 4 components)</summary> |
|
|
|
| Component | Labels | |
|
| --- | --- | |
|
| **`tagger`** | `ADJ`, `ADJ_PART`, `ADP`, `ADV`, `ADV_PART`, `AUX`, `CCONJ`, `DET`, `DET_ADP`, `DET_PART`, `INTJ`, `NOUN`, `NOUN_PART`, `NUM`, `NUM_PART`, `PART`, `PRON`, `PRON_PART`, `PROPN`, `PUNCT`, `SCONJ`, `VERB`, `VERB_PART` | |
|
| **`morphologizer`** | `Aspect=Perf\|Mood=Ind\|POS=VERB\|Voice=Act`, `Case=Nom\|POS=ADP`, `POS=NOUN`, `POS=PUNCT`, `Aspect=Perf\|Mood=Ind\|POS=VERB\|Voice=Pass`, `Case=Gen\|POS=ADP`, `Case=Gen\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Aspect=Imp\|Mood=Ind\|POS=VERB\|Voice=Act`, `POS=ADV\|PronType=Dem`, `Foreign=Yes\|POS=NOUN`, `Degree=Pos\|POS=ADJ`, `Case=Nom\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Deixis=Med\|Number=Sing\|POS=PRON\|PronType=Dem`, `Gender=Masc\|POS=PROPN`, `Case=Gen\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Link=Yes\|POS=ADJ`, `POS=ADP`, `Case=Dat\|POS=ADP`, `POS=VERB\|Polarity=Pos`, `Aspect=Hab\|POS=VERB`, `POS=SCONJ`, `Case=Nom\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Aspect=Prosp\|Mood=Ind\|POS=VERB\|Voice=Act`, `POS=ADV`, `POS=PART\|Polarity=Neg`, `Aspect=Imp\|Mood=Ind\|POS=VERB\|Voice=Pass`, `Aspect=Perf\|Mood=Ind\|POS=VERB\|Voice=Lfoc`, `POS=PROPN`, `Case=Nom\|Deixis=Prox\|Number=Sing\|POS=PRON\|PronType=Dem`, `Gender=Masc\|POS=NOUN`, `Gender=Fem\|POS=NOUN`, `Degree=Pos\|Gender=Fem\|POS=ADJ`, `Gender=Fem\|POS=PROPN`, `Case=Nom\|Clusivity=In\|Number=Dual\|POS=PRON\|Person=1\|PronType=Prs`, `Number=Plur\|POS=DET\|PronType=Ind`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `POS=PRON\|PronType=Prs\|Reflex=Yes`, `Gender=Masc\|POS=DET\|PronType=Emp`, `Case=Nom\|POS=PRON\|PronType=Int`, `Link=Yes\|POS=NOUN`, `POS=PART\|PartType=Int`, `POS=INTJ\|Polarity=Pos`, `Link=Yes\|POS=PART\|PartType=Int`, `POS=VERB\|Polarity=Neg`, `Degree=Pos\|POS=ADJ\|PronType=Int`, `Case=Gen\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Aspect=Perf\|Mood=Ind\|POS=VERB\|PronType=Int\|Voice=Act`, `Case=Nom\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Aspect=Perf\|Mood=Ind\|POS=VERB\|PronType=Int\|Voice=Pass`, `Aspect=Perf\|Mood=Ind\|POS=VERB\|Voice=Ifoc`, `POS=ADV\|PronType=Int`, `Aspect=Prog\|Mood=Ind\|POS=VERB\|Voice=Act`, `POS=PART\|PartType=Nfh`, `Deixis=Remt\|POS=ADV\|PronType=Dem`, `Aspect=Imp\|Mood=Pot\|POS=VERB\|Voice=Act`, `Link=Yes\|POS=VERB\|Polarity=Pos`, `Link=Yes\|POS=VERB\|Polarity=Neg`, `POS=PART\|PartType=Des`, `Mood=Imp\|POS=AUX\|Polarity=Neg`, `Case=Nom\|Link=Yes\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Nom\|Link=Yes\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Aspect=Prog\|Mood=Ind\|POS=VERB\|Voice=Pass`, `Aspect=Prog\|Mood=Ind\|POS=VERB\|Voice=Lfoc`, `Aspect=Prog\|Mood=Ind\|POS=VERB\|Voice=Bfoc`, `POS=DET\|PronType=Tot`, `Case=Dat\|Link=Yes\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Link=Yes\|POS=PRON\|PronType=Prs\|Reflex=Yes`, `Mood=Imp\|POS=VERB\|Voice=Act`, `Case=Dat\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Mood=Imp\|POS=VERB\|Voice=Lfoc`, `Case=Gen\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Mood=Imp\|POS=VERB\|Voice=Pass`, `Case=Gen\|Clusivity=In\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Aspect=Hab\|POS=VERB\|Voice=Pass`, `Gender=Masc\|Link=Yes\|POS=PROPN`, `Case=Gen\|Link=Yes\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Link=Yes\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `POS=ADJ`, `POS=PART`, `POS=PRON`, `POS=VERB`, `POS=INTJ`, `POS=CCONJ`, `POS=NUM`, `POS=DET` | |
|
| **`parser`** | `ROOT`, `advmod`, `case`, `dep`, `nmod`, `nsubj`, `obj`, `obl`, `punct` | |
|
| **`ner`** | `LOC`, `ORG`, `PER` | |
|
|
|
</details> |
|
|
|
### Citation |
|
|
|
``` |
|
@inproceedings{miranda-2023-calamancy, |
|
title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit", |
|
author = "Miranda, Lester James", |
|
booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)", |
|
month = dec, |
|
year = "2023", |
|
address = "Singapore, Singapore", |
|
publisher = "Empirical Methods in Natural Language Processing", |
|
url = "https://aclanthology.org/2023.nlposs-1.1", |
|
pages = "1--7", |
|
} |
|
``` |