tags: | |
- spacy | |
- token-classification | |
language: | |
- zh | |
model-index: | |
- name: zh_lzh_sigtyp_trf | |
results: | |
- task: | |
name: TAG | |
type: token-classification | |
metrics: | |
- name: TAG (XPOS) Accuracy | |
type: accuracy | |
value: 0.742052984 | |
- task: | |
name: POS | |
type: token-classification | |
metrics: | |
- name: POS (UPOS) Accuracy | |
type: accuracy | |
value: 0.7949418685 | |
- task: | |
name: MORPH | |
type: token-classification | |
metrics: | |
- name: Morph (UFeats) Accuracy | |
type: accuracy | |
value: 0.8236478744 | |
- task: | |
name: LEMMA | |
type: token-classification | |
metrics: | |
- name: Lemma Accuracy | |
type: accuracy | |
value: 0.942007037 | |
- task: | |
name: UNLABELED_DEPENDENCIES | |
type: token-classification | |
metrics: | |
- name: Unlabeled Attachment Score (UAS) | |
type: f_score | |
value: 0.8228271306 | |
- task: | |
name: LABELED_DEPENDENCIES | |
type: token-classification | |
metrics: | |
- name: Labeled Attachment Score (LAS) | |
type: f_score | |
value: 0.7703219397 | |
- task: | |
name: SENTS | |
type: token-classification | |
metrics: | |
- name: Sentences F-Score | |
type: f_score | |
value: 0.9851073655 | |
| Feature | Description | | |
| --- | --- | | |
| **Name** | `zh_lzh_sigtyp_trf` | | |
| **Version** | `0.1.0` | | |
| **spaCy** | `>=3.6.1,<3.7.0` | | |
| **Default Pipeline** | `transformer`, `parser`, `trainable_lemmatizer`, `tagger`, `morphologizer` | | |
| **Components** | `transformer`, `parser`, `trainable_lemmatizer`, `tagger`, `morphologizer` | | |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) | | |
| **Sources** | n/a | | |
| **License** | n/a | | |
| **Author** | [n/a]() | | |
### Label Scheme | |
<details> | |
<summary>View label scheme (272 labels for 3 components)</summary> | |
| Component | Labels | | |
| --- | --- | | |
| **`parser`** | `ROOT`, `acl`, `advcl`, `advmod`, `amod`, `aux`, `case`, `cc`, `ccomp`, `clf`, `compound`, `compound:redup`, `conj`, `cop`, `csubj`, `csubj:outer`, `dep`, `det`, `discourse`, `discourse:sp`, `dislocated`, `expl`, `fixed`, `flat`, `flat:foreign`, `flat:vv`, `iobj`, `list`, `mark`, `nmod`, `nsubj`, `nsubj:outer`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `parataxis`, `vocative`, `xcomp` | | |
| **`tagger`** | `n,代名詞,人称,他__Person=1\|PronType=Prs`, `n,代名詞,人称,他__Person=2\|PronType=Prs`, `n,代名詞,人称,他__Person=3\|PronType=Prs`, `n,代名詞,人称,他__PronType=Prs`, `n,代名詞,人称,他__PronType=Prs\|Reflex=Yes`, `n,代名詞,人称,止格__Person=1\|PronType=Prs`, `n,代名詞,人称,止格__Person=2\|PronType=Prs`, `n,代名詞,人称,止格__Person=3\|PronType=Prs`, `n,代名詞,人称,起格__Person=1\|PronType=Prs`, `n,代名詞,人称,起格__Person=2\|PronType=Prs`, `n,代名詞,人称,起格__Person=3\|PronType=Prs`, `n,代名詞,人称,起格__PronType=Prs`, `n,代名詞,指示,*__PronType=Dem`, `n,代名詞,疑問,*__PronType=Int`, `n,名詞,不可譲,属性`, `n,名詞,不可譲,疾病`, `n,名詞,不可譲,身体`, `n,名詞,主体,動物`, `n,名詞,主体,国名__Case=Loc\|NameType=Nat`, `n,名詞,主体,書物`, `n,名詞,主体,機関`, `n,名詞,主体,神仏`, `n,名詞,主体,集団`, `n,名詞,人,その他の人名__NameType=Prs`, `n,名詞,人,人`, `n,名詞,人,名__NameType=Giv`, `n,名詞,人,姓氏__NameType=Sur`, `n,名詞,人,役割`, `n,名詞,人,複合的人名__NameType=Prs`, `n,名詞,人,関係`, `n,名詞,制度,儀礼`, `n,名詞,制度,場__Case=Loc`, `n,名詞,可搬,乗り物`, `n,名詞,可搬,伝達`, `n,名詞,可搬,成果物`, `n,名詞,可搬,糧食`, `n,名詞,可搬,道具`, `n,名詞,固定物,地名__Case=Loc\|NameType=Geo`, `n,名詞,固定物,地形__Case=Loc`, `n,名詞,固定物,建造物__Case=Loc`, `n,名詞,固定物,樹木`, `n,名詞,固定物,関係__Case=Loc`, `n,名詞,外観,人`, `n,名詞,天象,天文`, `n,名詞,天象,怪異`, `n,名詞,天象,気象`, `n,名詞,度量衡,*__NounType=Clf`, `n,名詞,思考,*`, `n,名詞,思考,思考`, `n,名詞,描写,形質`, `n,名詞,描写,態度`, `n,名詞,数量,*`, `n,名詞,時,*__Case=Tem`, `n,名詞,行為,*`, `n,数詞,干支,*__NumType=Ord`, `n,数詞,数,*`, `n,数詞,数字,*`, `p,助詞,句末,*`, `p,助詞,句頭,*`, `p,助詞,接続,並列`, `p,助詞,接続,体言化`, `p,助詞,接続,属格`, `p,助詞,提示,*`, `p,感嘆詞,*,*`, `p,接尾辞,*,*`, `s,文字,*,*`, `s,記号,一般,*`, `s,記号,括弧開,*`, `s,記号,読点,*`, `v,前置詞,基盤,*`, `v,前置詞,源泉,*`, `v,前置詞,経由,*`, `v,前置詞,関係,*`, `v,副詞,判断,推定`, `v,副詞,判断,確定`, `v,副詞,判断,逆接`, `v,副詞,否定,体言否定__Polarity=Neg`, `v,副詞,否定,有界__Polarity=Neg`, `v,副詞,否定,無界__Polarity=Neg`, `v,副詞,否定,禁止__Polarity=Neg`, `v,副詞,描写,*`, `v,副詞,時相,変化__AdvType=Tim`, `v,副詞,時相,完了__AdvType=Tim\|Aspect=Perf`, `v,副詞,時相,将来__AdvType=Tim\|Tense=Fut`, `v,副詞,時相,恒常__AdvType=Tim`, `v,副詞,時相,現在__AdvType=Tim\|Tense=Pres`, `v,副詞,時相,終局__AdvType=Tim`, `v,副詞,時相,継起__AdvType=Tim`, `v,副詞,時相,緊接__AdvType=Tim`, `v,副詞,時相,過去__AdvType=Tim\|Tense=Past`, `v,副詞,疑問,原因__AdvType=Cau`, `v,副詞,疑問,反語`, `v,副詞,疑問,所在`, `v,副詞,程度,やや高度__AdvType=Deg\|Degree=Cmp`, `v,副詞,程度,極度__AdvType=Deg\|Degree=Sup`, `v,副詞,程度,軽度__AdvType=Deg\|Degree=Pos`, `v,副詞,範囲,共同`, `v,副詞,範囲,総括`, `v,副詞,範囲,限定`, `v,副詞,頻度,偶発`, `v,副詞,頻度,重複`, `v,副詞,頻度,頻繁`, `v,助動詞,受動,*__Voice=Pass`, `v,助動詞,可能,*__Mood=Pot`, `v,助動詞,必要,*__Mood=Nec`, `v,助動詞,願望,*__Mood=Des`, `v,動詞,変化,制度`, `v,動詞,変化,制度__VerbForm=Conv`, `v,動詞,変化,制度__VerbForm=Part`, `v,動詞,変化,性質`, `v,動詞,変化,性質__VerbForm=Conv`, `v,動詞,変化,性質__VerbForm=Part`, `v,動詞,変化,生物`, `v,動詞,変化,生物__VerbForm=Conv`, `v,動詞,変化,生物__VerbForm=Part`, `v,動詞,存在,存在`, `v,動詞,存在,存在__Polarity=Neg`, `v,動詞,存在,存在__Polarity=Neg\|VerbForm=Conv`, `v,動詞,存在,存在__Polarity=Neg\|VerbForm=Part`, `v,動詞,存在,存在__VerbForm=Conv`, `v,動詞,存在,存在__VerbForm=Part`, `v,動詞,存在,存在__VerbType=Cop`, `v,動詞,描写,境遇__Degree=Pos`, `v,動詞,描写,境遇__Degree=Pos\|VerbForm=Conv`, `v,動詞,描写,境遇__Degree=Pos\|VerbForm=Part`, `v,動詞,描写,形質__Degree=Pos`, `v,動詞,描写,形質__Degree=Pos\|VerbForm=Conv`, `v,動詞,描写,形質__Degree=Pos\|VerbForm=Part`, `v,動詞,描写,態度__Degree=Pos`, `v,動詞,描写,態度__Degree=Pos\|VerbForm=Conv`, `v,動詞,描写,態度__Degree=Pos\|VerbForm=Part`, `v,動詞,描写,量__Degree=Pos`, `v,動詞,描写,量__Degree=Pos\|VerbForm=Conv`, `v,動詞,描写,量__Degree=Pos\|VerbForm=Part`, `v,動詞,行為,交流`, `v,動詞,行為,交流__VerbForm=Conv`, `v,動詞,行為,交流__VerbForm=Part`, `v,動詞,行為,伝達`, `v,動詞,行為,伝達__VerbForm=Conv`, `v,動詞,行為,伝達__VerbForm=Part`, `v,動詞,行為,使役`, `v,動詞,行為,使役__VerbForm=Conv`, `v,動詞,行為,使役__VerbForm=Part`, `v,動詞,行為,儀礼`, `v,動詞,行為,儀礼__VerbForm=Conv`, `v,動詞,行為,儀礼__VerbForm=Part`, `v,動詞,行為,分類__Degree=Equ`, `v,動詞,行為,分類__Degree=Equ\|VerbForm=Conv`, `v,動詞,行為,分類__Degree=Equ\|VerbForm=Part`, `v,動詞,行為,動作`, `v,動詞,行為,動作__VerbForm=Conv`, `v,動詞,行為,動作__VerbForm=Part`, `v,動詞,行為,姿勢`, `v,動詞,行為,姿勢__VerbForm=Conv`, `v,動詞,行為,姿勢__VerbForm=Part`, `v,動詞,行為,役割`, `v,動詞,行為,役割__VerbForm=Conv`, `v,動詞,行為,役割__VerbForm=Part`, `v,動詞,行為,得失`, `v,動詞,行為,得失__VerbForm=Conv`, `v,動詞,行為,得失__VerbForm=Part`, `v,動詞,行為,態度`, `v,動詞,行為,態度__VerbForm=Conv`, `v,動詞,行為,態度__VerbForm=Part`, `v,動詞,行為,生産`, `v,動詞,行為,生産__VerbForm=Conv`, `v,動詞,行為,生産__VerbForm=Part`, `v,動詞,行為,移動`, `v,動詞,行為,移動__VerbForm=Conv`, `v,動詞,行為,移動__VerbForm=Part`, `v,動詞,行為,設置`, `v,動詞,行為,設置__VerbForm=Conv`, `v,動詞,行為,設置__VerbForm=Part`, `v,動詞,行為,飲食`, `v,動詞,行為,飲食__VerbForm=Conv`, `v,動詞,行為,飲食__VerbForm=Part` | | |
| **`morphologizer`** | `POS=VERB`, `Case=Loc\|POS=NOUN`, `POS=VERB\|Polarity=Neg`, `POS=VERB\|VerbForm=Part`, `POS=NOUN`, `POS=ADV`, `AdvType=Tim\|POS=ADV\|Tense=Fut`, `POS=PRON\|Person=3\|PronType=Prs`, `POS=PART`, `POS=NUM`, `POS=CCONJ`, `Case=Tem\|POS=NOUN`, `Degree=Pos\|POS=VERB\|VerbForm=Part`, `Degree=Pos\|POS=VERB`, `NameType=Giv\|POS=PROPN`, `POS=PRON\|Person=2\|PronType=Prs`, `POS=SCONJ`, `POS=ADV\|Polarity=Neg`, `POS=PRON\|PronType=Dem`, `POS=AUX\|VerbType=Cop`, `NameType=Sur\|POS=PROPN`, `Mood=Pot\|POS=AUX`, `POS=ADV\|VerbForm=Conv`, `POS=ADP`, `NameType=Prs\|POS=PROPN`, `AdvType=Tim\|POS=ADV`, `POS=PRON\|Person=1\|PronType=Prs`, `Degree=Pos\|POS=ADV\|VerbForm=Conv`, `Case=Loc\|NameType=Nat\|POS=PROPN`, `POS=INTJ`, `AdvType=Tim\|Aspect=Perf\|POS=ADV`, `POS=PRON\|PronType=Int`, `Case=Loc\|NameType=Geo\|POS=PROPN`, `AdvType=Cau\|POS=ADV`, `POS=PRON\|PronType=Prs\|Reflex=Yes`, `Mood=Des\|POS=AUX`, `Degree=Equ\|POS=VERB`, `AdvType=Tim\|POS=ADV\|Tense=Past`, `POS=PRON\|PronType=Prs`, `POS=SYM`, `AdvType=Deg\|Degree=Cmp\|POS=ADV`, `POS=AUX\|Voice=Pass`, `NounType=Clf\|POS=NOUN`, `POS=ADV\|Polarity=Neg\|VerbForm=Conv`, `NumType=Ord\|POS=NUM`, `POS=VERB\|Polarity=Neg\|VerbForm=Part`, `Degree=Equ\|POS=ADV\|VerbForm=Conv`, `AdvType=Tim\|POS=ADV\|Tense=Pres`, `Mood=Nec\|POS=AUX`, `AdvType=Deg\|Degree=Sup\|POS=ADV`, `Degree=Equ\|POS=VERB\|VerbForm=Part`, `AdvType=Deg\|Degree=Pos\|POS=ADV`, `Degree=Equ\|POS=ADP`, `POS=PUNCT`, `POS=PROPN`, `Degree=Pos\|POS=NOUN` | | |
</details> | |
### Accuracy | |
| Type | Score | | |
| --- | --- | | |
| `DEP_UAS` | 82.28 | | |
| `DEP_LAS` | 77.03 | | |
| `SENTS_P` | 98.08 | | |
| `SENTS_R` | 98.94 | | |
| `SENTS_F` | 98.51 | | |
| `LEMMA_ACC` | 94.20 | | |
| `TAG_ACC` | 74.21 | | |
| `POS_ACC` | 79.49 | | |
| `MORPH_ACC` | 82.36 | | |
| `TRANSFORMER_LOSS` | 3733506.66 | | |
| `PARSER_LOSS` | 1170567.44 | | |
| `TRAINABLE_LEMMATIZER_LOSS` | 98325.33 | | |
| `TAGGER_LOSS` | 2806037.87 | | |
| `MORPHOLOGIZER_LOSS` | 2426650.00 | | |
### Citation | |
If you're using this model, please cite: | |
``` | |
@inproceedings{miranda-2024-allen, | |
title = "{A}llen Institute for {AI} @ {SIGTYP} 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages", | |
author = "Miranda, Lester James", | |
booktitle = "Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual NLP", | |
month = mar, | |
year = "2024", | |
address = "St. Julian's, Malta", | |
publisher = "Association for Computational Linguistics", | |
url = "https://aclanthology.org/2024.sigtyp-1.18", | |
pages = "151--159", | |
} | |
``` |