KennethEnevoldsen commited on
Commit
fe85a1c
1 Parent(s): 5cfbb2b

Updated model to v 0.2.0

Browse files
.gitattributes CHANGED
@@ -20,3 +20,4 @@
20
  *strings.json filter=lfs diff=lfs merge=lfs -text
21
  vectors filter=lfs diff=lfs merge=lfs -text
22
  model filter=lfs diff=lfs merge=lfs -text
 
 
20
  *strings.json filter=lfs diff=lfs merge=lfs -text
21
  vectors filter=lfs diff=lfs merge=lfs -text
22
  model filter=lfs diff=lfs merge=lfs -text
23
+ entity_linker/kb/* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,89 +1,38 @@
1
- ---
2
- tags:
3
- - spacy
4
- - token-classification
5
- language:
6
- - da
7
- license: apache-2.0
8
- datasets:
9
- - dane
10
- model-index:
11
- - name: da_dacy_large_trf
12
- results:
13
- - task:
14
- name: NER
15
- type: token-classification
16
- metrics:
17
- - name: NER Precision
18
- type: precision
19
- value: 0.8902439024
20
- - name: NER Recall
21
- type: recall
22
- value: 0.9125
23
- - name: NER F Score
24
- type: f_score
25
- value: 0.9012345679
26
- - task:
27
- name: SENTER
28
- type: token-classification
29
- metrics:
30
- - name: SENTER Precision
31
- type: precision
32
- value: 0.9608540925
33
- - name: SENTER Recall
34
- type: recall
35
- value: 0.9574468085
36
- - name: SENTER F Score
37
- type: f_score
38
- value: 0.9591474245
39
- - task:
40
- name: UNLABELED_DEPENDENCIES
41
- type: token-classification
42
- metrics:
43
- - name: Unlabeled Dependencies Accuracy
44
- type: accuracy
45
- value: 0.9074560179
46
- - task:
47
- name: LABELED_DEPENDENCIES
48
- type: token-classification
49
- metrics:
50
- - name: Labeled Dependencies Accuracy
51
- type: accuracy
52
- value: 0.9074560179
53
- ---
54
 
55
  <a href="https://github.com/centre-for-humanities-computing/Dacy"><img src="https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png" width="175" height="175" align="right" /></a>
56
 
57
- # DaCy large transformer
58
 
59
  DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.
60
- DaCy's largest pipeline has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency
61
- parsing for Danish on the DaNE dataset. Check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results.
 
62
  DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.
63
-
64
 
65
  | Feature | Description |
66
  | --- | --- |
67
  | **Name** | `da_dacy_large_trf` |
68
- | **Version** | `0.1.0` |
69
- | **spaCy** | `>=3.1.1,<3.2.0` |
70
- | **Default Pipeline** | `transformer`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
71
- | **Components** | `transformer`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
72
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
73
- | **Sources** | [UD Danish DDT v2.5](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[xlm-roberta-large](https://huggingface.co/xlm-roberta-large) (Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov) |
74
- | **License** | `Apache-2.0 License` |
75
- | **Author** | [Centre for Humanities Computing Aarhus](https://chcaa.io/#/) |
76
 
77
  ### Label Scheme
78
 
79
  <details>
80
 
81
- <summary>View label scheme (192 labels for 3 components)</summary>
82
 
83
  | Component | Labels |
84
  | --- | --- |
85
- | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `POS=CCONJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Degree=Pos\|POS=ADV`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=PRON\|PronType=Dem`, `NumType=Card\|POS=NUM`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `NumType=Ord\|POS=ADJ`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `POS=ADP\|PartType=Inf`, `Degree=Pos\|POS=ADJ`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Number=Plur\|POS=DET\|PronType=Ind`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `POS=PART\|PartType=Inf`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Mood=Imp\|POS=VERB`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=X`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `POS=ADV\|PartType=Inf`, `Degree=Sup\|POS=ADV`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Number=Plur\|POS=PRON\|PronType=Ind`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|POS=PROPN`, `POS=ADP`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=INTJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=SYM`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Degree=Sup\|POS=ADJ`, `Number=Plur\|POS=DET\|PronType=Ind\|Style=Arch`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Foreign=Yes\|POS=X`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Definite=Ind\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Degree=Abs\|POS=ADV`, `POS=VERB\|VerbForm=Ger`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Pres`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=PRON\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=AUX\|Tense=Pres\|VerbForm=Part`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Mood=Imp\|POS=AUX`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|POS=NOUN`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=DET\|PronType=Dem`, `Definite=Def\|Number=Plur\|POS=NOUN` |
86
- | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:loc`, `obl:tmod`, `punct`, `xcomp` |
 
87
  | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
88
 
89
  </details>
@@ -92,104 +41,37 @@ DaCy also contains guides on usage of the package as well as behavioural test fo
92
 
93
  | Type | Score |
94
  | --- | --- |
95
- | `POS_ACC` | 98.70 |
96
- | `MORPH_ACC` | 98.49 |
97
- | `DEP_UAS` | 90.75 |
98
- | `DEP_LAS` | 88.38 |
99
- | `SENTS_P` | 96.09 |
100
- | `SENTS_R` | 95.74 |
101
- | `SENTS_F` | 95.91 |
102
- | `LEMMA_ACC` | 84.91 |
103
- | `ENTS_F` | 90.12 |
104
- | `ENTS_P` | 89.02 |
105
- | `ENTS_R` | 91.25 |
106
- | `TRANSFORMER_LOSS` | 1805626.49 |
107
- | `MORPHOLOGIZER_LOSS` | 111735.86 |
108
- | `PARSER_LOSS` | 8037491.27 |
109
- | `NER_LOSS` | 16634.46 |
110
-
111
-
112
- ## Bias and Robustness
113
-
114
- Besides the validation done by SpaCy on the DaNE testset, DaCy also provides a series of augmentations to the DaNE test set to see how well the models deal with these types of augmentations.
115
- The can be seen as behavioural probes akinn to the NLP checklist.
116
-
117
- ### Deterministic Augmentations
118
- Deterministic augmentations are augmentation which always yield the same result.
119
-
120
- | Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) | Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |
121
- | --- | --- | --- | --- | --- | --- | --- | --- |
122
- | No augmentation | 0.985 | 0.979 | 0.906 | 0.881 | 0.986 | 0.844 | 0.839 |
123
- | Æøå Augmentation | 0.973 | 0.963 | 0.892 | 0.863 | 0.975 | 0.754 | 0.815 |
124
- | Lowercase | 0.981 | 0.975 | 0.902 | 0.876 | 0.93 | 0.848 | 0.788 |
125
- | No Spacing | 0.227 | 0.229 | 0.004 | 0.004 | 0.54 | 0.225 | 0.086 |
126
- | Abbreviated first names | 0.984 | 0.978 | 0.903 | 0.878 | 0.986 | 0.845 | 0.839 |
127
- | Input size augmentation 5 sentences | 0.986 | 0.981 | 0.904 | 0.88 | 0.97 | 0.844 | 0.847 |
128
- | Input size augmentation 10 sentences | 0.986 | 0.981 | 0.905 | 0.881 | 0.964 | 0.844 | 0.849 |
129
-
130
-
131
-
132
- ### Stochastic Augmentations
133
- Stochastic augmentations are augmentation which are repeated mulitple times to estimate the effect of the augmentation.
134
-
135
- | Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) | Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |
136
- | --- | --- | --- | --- | --- | --- | --- | --- |
137
- | Keystroke errors 2% | 0.949 (0.002) | 0.944 (0.002) | 0.868 (0.002) | 0.833 (0.002) | 0.965 (0.002) | 0.773 (0.002) | 0.775 (0.002) |
138
- | Keystroke errors 5% | 0.895 (0.003) | 0.893 (0.003) | 0.81 (0.003) | 0.76 (0.003) | 0.92 (0.003) | 0.68 (0.003) | 0.698 (0.003) |
139
- | Keystroke errors 15% | 0.705 (0.005) | 0.72 (0.005) | 0.6 (0.005) | 0.518 (0.005) | 0.801 (0.005) | 0.462 (0.005) | 0.506 (0.005) |
140
- | Danish names | 0.984 (0.0) | 0.979 (0.0) | 0.904 (0.0) | 0.879 (0.0) | 0.987 (0.0) | 0.847 (0.0) | 0.844 (0.0) |
141
- | Muslim names | 0.984 (0.0) | 0.979 (0.0) | 0.904 (0.0) | 0.879 (0.0) | 0.987 (0.0) | 0.847 (0.0) | 0.844 (0.0) |
142
- | Female names | 0.984 (0.0) | 0.979 (0.0) | 0.904 (0.0) | 0.879 (0.0) | 0.986 (0.0) | 0.847 (0.0) | 0.846 (0.0) |
143
- | Male names | 0.984 (0.0) | 0.979 (0.0) | 0.904 (0.0) | 0.879 (0.0) | 0.986 (0.0) | 0.846 (0.0) | 0.845 (0.0) |
144
- | Spacing Augmention 5% | 0.946 (0.002) | 0.941 (0.002) | 0.794 (0.002) | 0.771 (0.002) | 0.969 (0.002) | 0.812 (0.002) | 0.781 (0.002) |
145
-
146
- <details>
147
-
148
- <summary> Description of Augmenters </summary>
149
-
150
-
151
-
152
- **No augmentation:**
153
- Applies no augmentation to the DaNE test set.
154
-
155
- **Æøå Augmentation:**
156
- This augmentation replace the æ,ø, and å with their spelling variations ae, oe and aa respectively.
157
-
158
- **Lowercase:**
159
- This augmentation lowercases all text.
160
-
161
- **No Spacing:**
162
- This augmentation removed all spacing from the text.
163
-
164
- **Abbreviated first names:**
165
- This agmentation abbreviates the first names of entities. For instance 'Kenneth Enevoldsen' would turn to 'K. Enevoldsen'.
166
-
167
- **Keystroke errors 2%:**
168
- This agmentation simulate keystroke errors by replacing 2% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
169
-
170
- **Keystroke errors 5%:**
171
- This agmentation simulate keystroke errors by replacing 5% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
172
-
173
- **Keystroke errors 15%:**
174
- This agmentation simulate keystroke errors by replacing 15% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
175
-
176
- **Danish names:**
177
- This agmentation replace all names with Danish names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
178
-
179
- **Muslim names:**
180
- This agmentation replace all names with Muslim names derived from Meldgaard (2005). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
181
-
182
- **Female names:**
183
- This agmentation replace all names with Danish female names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
184
-
185
- **Male names:**
186
- This agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
187
-
188
- **Spacing Augmention 5%:**
189
- This agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
190
- </details>
191
- <br />
192
-
193
-
194
- ### Hardware
195
- This was run and trained on a Quadro RTX 8000 GPU.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
  <a href="https://github.com/centre-for-humanities-computing/Dacy"><img src="https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png" width="175" height="175" align="right" /></a>
3
 
4
+ # DaCy large
5
 
6
  DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.
7
+ DaCy's largest pipeline has achieved State-of-the-Art performance on parts-of-speech tagging and dependency
8
+ parsing for Danish on the Danish Dependency treebank as well as competitive performance on named entity recognition, named entity disambiguation and coreference resolution.
9
+ To read more check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results.
10
  DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.
11
+
12
 
13
  | Feature | Description |
14
  | --- | --- |
15
  | **Name** | `da_dacy_large_trf` |
16
+ | **Version** | `0.2.0` |
17
+ | **spaCy** | `>=3.5.2,<3.6.0` |
18
+ | **Default Pipeline** | `transformer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `ner`, `coref`, `span_resolver`, `span_cleaner`, `entity_linker` |
19
+ | **Components** | `transformer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `ner`, `coref`, `span_resolver`, `span_cleaner`, `entity_linker` |
20
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
21
+ | **Sources** | [UD Danish DDT v2.11](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://huggingface.co/datasets/dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[DaCoref](https://huggingface.co/datasets/alexandrainst/dacoref) (Buch-Kromann, Matthias)<br />[DaNED](https://danlp-alexandra.readthedocs.io/en/stable/docs/datasets.html#daned) (Barrett, M. J., Lam, H., Wu, M., Lacroix, O., Plank, B., & Søgaard, A.)<br />[chcaa/dfm-encoder-large-v1](https://huggingface.co/chcaa/dfm-encoder-large-v1) (The Danish Foundation Models team) |
22
+ | **License** | `Apache-2.0` |
23
+ | **Author** | [Kenneth Enevoldsen](https://chcaa.io/#/) |
24
 
25
  ### Label Scheme
26
 
27
  <details>
28
 
29
+ <summary>View label scheme (211 labels for 4 components)</summary>
30
 
31
  | Component | Labels |
32
  | --- | --- |
33
+ | **`tagger`** | `ADJ`, `ADP`, `ADV`, `AUX`, `CCONJ`, `DET`, `INTJ`, `NOUN`, `NUM`, `PART`, `PRON`, `PROPN`, `PUNCT`, `SCONJ`, `SYM`, `VERB`, `X` |
34
+ | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `NumType=Ord\|POS=ADJ`, `POS=CCONJ`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Sup\|POS=ADV`, `Degree=Pos\|POS=ADV`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Number=Plur\|POS=DET\|PronType=Ind`, `POS=ADP`, `POS=ADV\|PartType=Inf`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=ADP\|PartType=Inf`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `NumType=Card\|POS=NUM`, `Degree=Pos\|POS=ADJ`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=PART\|PartType=Inf`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Number=Plur\|POS=PRON\|PronType=Ind`, `POS=INTJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Case=Gen\|POS=PROPN`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `POS=PRON\|PronType=Dem`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Number=Plur\|POS=NUM`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `POS=PRON`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Definite=Ind\|Number=Sing\|POS=NUM`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Foreign=Yes\|POS=ADV`, `POS=NOUN`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Degree=Sup\|POS=ADJ`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Mood=Imp\|POS=VERB`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `POS=X`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=VERB\|VerbForm=Ger`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Foreign=Yes\|POS=X`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=PRON\|PronType=Rcp`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `POS=SYM`, `POS=DET\|PronType=Dem`, `Gender=Com\|Number=Sing\|POS=NUM`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `POS=VERB\|Tense=Pres`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NUM`, `Degree=Abs\|POS=ADV`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|POS=NOUN`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NUM`, `Definite=Def\|Number=Plur\|POS=NOUN`, `Case=Gen\|POS=NOUN`, `POS=AUX\|Tense=Pres\|VerbForm=Part` |
35
+ | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `advmod:lmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `punct`, `xcomp` |
36
  | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
37
 
38
  </details>
 
41
 
42
  | Type | Score |
43
  | --- | --- |
44
+ | `TOKEN_ACC` | 99.92 |
45
+ | `TOKEN_P` | 99.70 |
46
+ | `TOKEN_R` | 99.77 |
47
+ | `TOKEN_F` | 99.74 |
48
+ | `SENTS_P` | 100.00 |
49
+ | `SENTS_R` | 100.00 |
50
+ | `SENTS_F` | 100.00 |
51
+ | `TAG_ACC` | 99.14 |
52
+ | `POS_ACC` | 99.08 |
53
+ | `MORPH_ACC` | 98.80 |
54
+ | `MORPH_MICRO_P` | 99.45 |
55
+ | `MORPH_MICRO_R` | 99.32 |
56
+ | `MORPH_MICRO_F` | 99.39 |
57
+ | `DEP_UAS` | 92.81 |
58
+ | `DEP_LAS` | 90.80 |
59
+ | `ENTS_P` | 88.58 |
60
+ | `ENTS_R` | 86.20 |
61
+ | `ENTS_F` | 87.38 |
62
+ | `LEMMA_ACC` | 95.89 |
63
+ | `COREF_LEA_F1` | 46.72 |
64
+ | `COREF_LEA_PRECISION` | 45.91 |
65
+ | `COREF_LEA_RECALL` | 47.56 |
66
+ | `NEL_SCORE` | 34.29 |
67
+ | `NEL_MICRO_P` | 84.00 |
68
+ | `NEL_MICRO_R` | 21.54 |
69
+ | `NEL_MICRO_F` | 34.29 |
70
+ | `NEL_MACRO_P` | 86.71 |
71
+ | `NEL_MACRO_R` | 24.70 |
72
+ | `NEL_MACRO_F` | 37.28 |
73
+
74
+
75
+
76
+ ### Training
77
+ This model was trained using [spaCy](https://spacy.io) and logged to [Weights & Biases](https://wandb.ai/kenevoldsen/dacy-v0.2.0). You can find all the training logs [here](https://wandb.ai/kenevoldsen/dacy-v0.2.0).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.cfg CHANGED
@@ -1,54 +1,101 @@
1
  [paths]
2
- train = "corpus/dane/train.spacy"
3
- dev = "corpus/dane/dev.spacy"
4
- vectors = null
5
- raw = null
6
  init_tok2vec = null
7
- vocab_data = null
 
8
 
9
  [system]
10
  gpu_allocator = "pytorch"
11
- seed = 1
12
 
13
  [nlp]
14
  lang = "da"
15
- pipeline = ["transformer","morphologizer","parser","attribute_ruler","lemmatizer","ner"]
 
16
  disabled = []
17
  before_creation = null
18
  after_creation = null
19
  after_pipeline_creation = null
20
- batch_size = 64
21
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
22
 
23
  [components]
24
 
25
- [components.attribute_ruler]
26
- factory = "attribute_ruler"
27
- validate = false
28
 
29
- [components.lemmatizer]
30
- factory = "lemmatizer"
31
- mode = "lookup"
32
- model = null
33
- overwrite = false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  [components.morphologizer]
36
  factory = "morphologizer"
 
 
 
37
 
38
  [components.morphologizer.model]
39
- @architectures = "spacy.Tagger.v1"
40
  nO = null
 
41
 
42
  [components.morphologizer.model.tok2vec]
43
  @architectures = "spacy-transformers.TransformerListener.v1"
44
  grad_factor = 1.0
45
- upstream = "transformer"
46
  pooling = {"@layers":"reduce_mean.v1"}
 
47
 
48
  [components.ner]
49
  factory = "ner"
50
  incorrect_spans_key = null
51
  moves = null
 
52
  update_with_oracle_cut_size = 100
53
 
54
  [components.ner.model]
@@ -63,96 +110,169 @@ nO = null
63
  [components.ner.model.tok2vec]
64
  @architectures = "spacy-transformers.TransformerListener.v1"
65
  grad_factor = 1.0
66
- upstream = "transformer"
67
  pooling = {"@layers":"reduce_mean.v1"}
 
68
 
69
  [components.parser]
70
  factory = "parser"
71
  learn_tokens = false
72
  min_action_freq = 30
73
  moves = null
 
74
  update_with_oracle_cut_size = 100
75
 
76
  [components.parser.model]
77
  @architectures = "spacy.TransitionBasedParser.v2"
78
  state_type = "parser"
79
  extra_state_tokens = false
80
- hidden_width = 64
81
- maxout_pieces = 2
82
  use_upper = false
83
  nO = null
84
 
85
  [components.parser.model.tok2vec]
86
  @architectures = "spacy-transformers.TransformerListener.v1"
87
  grad_factor = 1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  upstream = "transformer"
89
  pooling = {"@layers":"reduce_mean.v1"}
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  [components.transformer]
92
  factory = "transformer"
93
  max_batch_items = 4096
94
  set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
95
 
96
  [components.transformer.model]
97
- @architectures = "spacy-transformers.TransformerModel.v1"
98
- name = "xlm-roberta-large"
 
99
 
100
  [components.transformer.model.get_spans]
101
  @span_getters = "spacy-transformers.strided_spans.v1"
102
- window = 128
103
- stride = 96
 
 
104
 
105
  [components.transformer.model.tokenizer_config]
106
  use_fast = true
107
- strip_accents = false
 
108
 
109
  [corpora]
110
 
111
  [corpora.dev]
112
  @readers = "spacy.Corpus.v1"
113
- limit = 0
114
- max_length = 0
115
- path = ${paths:dev}
116
  gold_preproc = false
 
 
117
  augmenter = null
118
 
119
  [corpora.train]
120
  @readers = "spacy.Corpus.v1"
121
- path = ${paths:train}
122
- max_length = 500
123
  gold_preproc = false
 
124
  limit = 0
125
-
126
- [corpora.train.augmenter]
127
- @augmenters = "spacy.lower_case.v1"
128
- level = 0.1
129
 
130
  [training]
131
- train_corpus = "corpora.train"
132
- dev_corpus = "corpora.dev"
133
- seed = ${system:seed}
134
- gpu_allocator = ${system:gpu_allocator}
135
  dropout = 0.1
136
- accumulate_gradient = 3
137
- patience = 5000
138
  max_epochs = 0
139
  max_steps = 20000
140
- eval_frequency = 1000
141
  frozen_components = []
142
- before_to_disk = null
143
  annotating_components = []
 
 
 
 
144
 
145
  [training.batcher]
146
- @batchers = "spacy.batch_by_padded.v1"
147
- discard_oversize = true
 
148
  get_length = null
149
- size = 2000
150
- buffer = 256
 
 
 
 
 
151
 
152
  [training.logger]
153
- @loggers = "spacy.WandbLogger.v1"
154
- project_name = "dacy-an-efficient-pipeline-for-danish"
155
- remove_config_values = []
156
 
157
  [training.optimizer]
158
  @optimizers = "Adam.v1"
@@ -161,66 +281,44 @@ beta2 = 0.999
161
  L2_is_weight_decay = true
162
  L2 = 0.01
163
  grad_clip = 1.0
164
- use_averages = true
165
  eps = 0.00000001
166
-
167
- [training.optimizer.learn_rate]
168
- @schedules = "warmup_linear.v1"
169
- warmup_steps = 250
170
- total_steps = 20000
171
- initial_rate = 0.00005
172
 
173
  [training.score_weights]
174
- pos_acc = 0.08
175
- morph_acc = 0.08
 
176
  morph_per_feat = null
177
- dep_uas = 0.0
178
- dep_las = 0.16
 
179
  dep_las_per_type = null
180
  sents_p = null
181
  sents_r = null
182
- sents_f = 0.02
183
- lemma_acc = 0.5
184
- ents_f = 0.16
185
  ents_p = 0.0
186
  ents_r = 0.0
187
  ents_per_type = null
 
 
 
 
 
 
 
188
 
189
  [pretraining]
190
 
191
  [initialize]
192
- vocab_data = ${paths.vocab_data}
193
  vectors = ${paths.vectors}
194
  init_tok2vec = ${paths.init_tok2vec}
 
 
195
  before_init = null
196
  after_init = null
197
 
198
  [initialize.components]
199
 
200
- [initialize.components.morphologizer]
201
-
202
- [initialize.components.morphologizer.labels]
203
- @readers = "spacy.read_labels.v1"
204
- path = "corpus/labels/morphologizer.json"
205
- require = false
206
-
207
- [initialize.components.ner]
208
-
209
- [initialize.components.ner.labels]
210
- @readers = "spacy.read_labels.v1"
211
- path = "corpus/labels/ner.json"
212
- require = false
213
-
214
- [initialize.components.parser]
215
-
216
- [initialize.components.parser.labels]
217
- @readers = "spacy.read_labels.v1"
218
- path = "corpus/labels/parser.json"
219
- require = false
220
-
221
- [initialize.lookups]
222
- @misc = "spacy.LookupsDataLoader.v1"
223
- lang = ${nlp.lang}
224
- tables = ["lexeme_norm"]
225
-
226
  [initialize.tokenizer]
 
1
  [paths]
2
+ train = null
3
+ dev = null
 
 
4
  init_tok2vec = null
5
+ vectors = null
6
+ model_source = "training/da_dacy_large_trf/model-last"
7
 
8
  [system]
9
  gpu_allocator = "pytorch"
10
+ seed = 0
11
 
12
  [nlp]
13
  lang = "da"
14
+ pipeline = ["transformer","tagger","morphologizer","trainable_lemmatizer","parser","ner","coref","span_resolver","span_cleaner","entity_linker"]
15
+ batch_size = 512
16
  disabled = []
17
  before_creation = null
18
  after_creation = null
19
  after_pipeline_creation = null
 
20
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
21
 
22
  [components]
23
 
24
+ [components.coref]
25
+ factory = "experimental_coref"
26
+ span_cluster_prefix = "coref_head_clusters"
27
 
28
+ [components.coref.model]
29
+ @architectures = "spacy-experimental.Coref.v1"
30
+ distance_embedding_size = 20
31
+ dropout = 0.3
32
+ hidden_size = 1024
33
+ depth = 2
34
+ antecedent_limit = 100
35
+ antecedent_batch_size = 512
36
+
37
+ [components.coref.model.tok2vec]
38
+ @architectures = "spacy-transformers.TransformerListener.v1"
39
+ grad_factor = 0.5
40
+ upstream = "transformer"
41
+ pooling = {"@layers":"reduce_mean.v1"}
42
+
43
+ [components.coref.scorer]
44
+ @scorers = "spacy-experimental.coref_scorer.v1"
45
+ span_cluster_prefix = "coref_head_clusters"
46
+
47
+ [components.entity_linker]
48
+ factory = "entity_linker"
49
+ candidates_batch_size = 1
50
+ entity_vector_length = 768
51
+ generate_empty_kb = {"@misc":"spacy.EmptyKB.v2"}
52
+ get_candidates = {"@misc":"spacy.CandidateGenerator.v1"}
53
+ get_candidates_batch = {"@misc":"spacy.CandidateBatchGenerator.v1"}
54
+ incl_context = true
55
+ incl_prior = true
56
+ labels_discard = []
57
+ n_sents = 0
58
+ overwrite = true
59
+ scorer = {"@scorers":"spacy.entity_linker_scorer.v1"}
60
+ threshold = null
61
+ use_gold_ents = true
62
+
63
+ [components.entity_linker.model]
64
+ @architectures = "spacy.EntityLinker.v2"
65
+ nO = null
66
+
67
+ [components.entity_linker.model.tok2vec]
68
+ @architectures = "spacy.HashEmbedCNN.v2"
69
+ pretrained_vectors = null
70
+ width = 96
71
+ depth = 2
72
+ embed_size = 2000
73
+ window_size = 1
74
+ maxout_pieces = 3
75
+ subword_features = true
76
 
77
  [components.morphologizer]
78
  factory = "morphologizer"
79
+ extend = false
80
+ overwrite = true
81
+ scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
82
 
83
  [components.morphologizer.model]
84
+ @architectures = "spacy.Tagger.v2"
85
  nO = null
86
+ normalize = false
87
 
88
  [components.morphologizer.model.tok2vec]
89
  @architectures = "spacy-transformers.TransformerListener.v1"
90
  grad_factor = 1.0
 
91
  pooling = {"@layers":"reduce_mean.v1"}
92
+ upstream = "transformer"
93
 
94
  [components.ner]
95
  factory = "ner"
96
  incorrect_spans_key = null
97
  moves = null
98
+ scorer = {"@scorers":"spacy.ner_scorer.v1"}
99
  update_with_oracle_cut_size = 100
100
 
101
  [components.ner.model]
 
110
  [components.ner.model.tok2vec]
111
  @architectures = "spacy-transformers.TransformerListener.v1"
112
  grad_factor = 1.0
 
113
  pooling = {"@layers":"reduce_mean.v1"}
114
+ upstream = "transformer"
115
 
116
  [components.parser]
117
  factory = "parser"
118
  learn_tokens = false
119
  min_action_freq = 30
120
  moves = null
121
+ scorer = {"@scorers":"spacy.parser_scorer.v1"}
122
  update_with_oracle_cut_size = 100
123
 
124
  [components.parser.model]
125
  @architectures = "spacy.TransitionBasedParser.v2"
126
  state_type = "parser"
127
  extra_state_tokens = false
128
+ hidden_width = 128
129
+ maxout_pieces = 3
130
  use_upper = false
131
  nO = null
132
 
133
  [components.parser.model.tok2vec]
134
  @architectures = "spacy-transformers.TransformerListener.v1"
135
  grad_factor = 1.0
136
+ pooling = {"@layers":"reduce_mean.v1"}
137
+ upstream = "transformer"
138
+
139
+ [components.span_cleaner]
140
+ factory = "experimental_span_cleaner"
141
+ prefix = "coref_head_clusters"
142
+
143
+ [components.span_resolver]
144
+ factory = "experimental_span_resolver"
145
+ input_prefix = "coref_head_clusters"
146
+ output_prefix = "coref_clusters"
147
+
148
+ [components.span_resolver.model]
149
+ @architectures = "spacy-experimental.SpanResolver.v1"
150
+ hidden_size = 1024
151
+ distance_embedding_size = 64
152
+ conv_channels = 4
153
+ window_size = 1
154
+ max_distance = 128
155
+ prefix = "coref_head_clusters"
156
+
157
+ [components.span_resolver.model.tok2vec]
158
+ @architectures = "spacy-transformers.TransformerListener.v1"
159
+ grad_factor = 0.0
160
  upstream = "transformer"
161
  pooling = {"@layers":"reduce_mean.v1"}
162
 
163
+ [components.span_resolver.scorer]
164
+ @scorers = "spacy-experimental.span_resolver_scorer.v1"
165
+ input_prefix = "coref_head_clusters"
166
+ output_prefix = "coref_clusters"
167
+
168
+ [components.tagger]
169
+ factory = "tagger"
170
+ neg_prefix = "!"
171
+ overwrite = false
172
+ scorer = {"@scorers":"spacy.tagger_scorer.v1"}
173
+
174
+ [components.tagger.model]
175
+ @architectures = "spacy.Tagger.v2"
176
+ nO = null
177
+ normalize = false
178
+
179
+ [components.tagger.model.tok2vec]
180
+ @architectures = "spacy-transformers.TransformerListener.v1"
181
+ grad_factor = 1.0
182
+ pooling = {"@layers":"reduce_mean.v1"}
183
+ upstream = "transformer"
184
+
185
+ [components.trainable_lemmatizer]
186
+ factory = "trainable_lemmatizer"
187
+ backoff = "orth"
188
+ min_tree_freq = 3
189
+ overwrite = false
190
+ scorer = {"@scorers":"spacy.lemmatizer_scorer.v1"}
191
+ top_k = 1
192
+
193
+ [components.trainable_lemmatizer.model]
194
+ @architectures = "spacy.Tagger.v2"
195
+ nO = null
196
+ normalize = false
197
+
198
+ [components.trainable_lemmatizer.model.tok2vec]
199
+ @architectures = "spacy-transformers.TransformerListener.v1"
200
+ grad_factor = 1.0
201
+ pooling = {"@layers":"reduce_mean.v1"}
202
+ upstream = "transformer"
203
+
204
  [components.transformer]
205
  factory = "transformer"
206
  max_batch_items = 4096
207
  set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
208
 
209
  [components.transformer.model]
210
+ @architectures = "spacy-transformers.TransformerModel.v3"
211
+ name = "chcaa/dfm-encoder-large-v1"
212
+ mixed_precision = false
213
 
214
  [components.transformer.model.get_spans]
215
  @span_getters = "spacy-transformers.strided_spans.v1"
216
+ window = 400
217
+ stride = 350
218
+
219
+ [components.transformer.model.grad_scaler_config]
220
 
221
  [components.transformer.model.tokenizer_config]
222
  use_fast = true
223
+
224
+ [components.transformer.model.transformer_config]
225
 
226
  [corpora]
227
 
228
  [corpora.dev]
229
  @readers = "spacy.Corpus.v1"
230
+ path = ${paths.dev}
 
 
231
  gold_preproc = false
232
+ max_length = 0
233
+ limit = 0
234
  augmenter = null
235
 
236
  [corpora.train]
237
  @readers = "spacy.Corpus.v1"
238
+ path = ${paths.train}
 
239
  gold_preproc = false
240
+ max_length = 0
241
  limit = 0
242
+ augmenter = null
 
 
 
243
 
244
  [training]
245
+ seed = ${system.seed}
246
+ gpu_allocator = ${system.gpu_allocator}
 
 
247
  dropout = 0.1
248
+ accumulate_gradient = 1
249
+ patience = 1600
250
  max_epochs = 0
251
  max_steps = 20000
252
+ eval_frequency = 200
253
  frozen_components = []
 
254
  annotating_components = []
255
+ dev_corpus = "corpora.dev"
256
+ train_corpus = "corpora.train"
257
+ before_to_disk = null
258
+ before_update = null
259
 
260
  [training.batcher]
261
+ @batchers = "spacy.batch_by_words.v1"
262
+ discard_oversize = false
263
+ tolerance = 0.2
264
  get_length = null
265
+
266
+ [training.batcher.size]
267
+ @schedules = "compounding.v1"
268
+ start = 100
269
+ stop = 1000
270
+ compound = 1.001
271
+ t = 0.0
272
 
273
  [training.logger]
274
+ @loggers = "spacy.ConsoleLogger.v1"
275
+ progress_bar = false
 
276
 
277
  [training.optimizer]
278
  @optimizers = "Adam.v1"
 
281
  L2_is_weight_decay = true
282
  L2 = 0.01
283
  grad_clip = 1.0
284
+ use_averages = false
285
  eps = 0.00000001
286
+ learn_rate = 0.001
 
 
 
 
 
287
 
288
  [training.score_weights]
289
+ tag_acc = 0.12
290
+ pos_acc = 0.06
291
+ morph_acc = 0.06
292
  morph_per_feat = null
293
+ lemma_acc = 0.12
294
+ dep_uas = 0.06
295
+ dep_las = 0.06
296
  dep_las_per_type = null
297
  sents_p = null
298
  sents_r = null
299
+ sents_f = 0.0
300
+ ents_f = 0.12
 
301
  ents_p = 0.0
302
  ents_r = 0.0
303
  ents_per_type = null
304
+ coref_f = 0.12
305
+ coref_p = null
306
+ coref_r = null
307
+ span_accuracy = 0.12
308
+ nel_micro_f = 0.12
309
+ nel_micro_r = null
310
+ nel_micro_p = null
311
 
312
  [pretraining]
313
 
314
  [initialize]
 
315
  vectors = ${paths.vectors}
316
  init_tok2vec = ${paths.init_tok2vec}
317
+ vocab_data = null
318
+ lookups = null
319
  before_init = null
320
  after_init = null
321
 
322
  [initialize.components]
323
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
324
  [initialize.tokenizer]
coref/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "nI":1024
3
+ }
transformer/model/pytorch_model.bin → coref/model RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8964c70b5415b19dc9a2bb01a796e60df3ada1c543263b1803c61c269213e89b
3
- size 2239724887
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:530693cc6611b9b24a03c8cabd239861c9565c5015490d33c2682d3755991014
3
+ size 54662492
da_dacy_large_trf-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8b3a113d8d6e3e3d9077fcbb4511cf8182765ee5063790a8a87952a1f916cdbf
3
- size 1823104200
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e892bff73b78466e9a3f1f5bfe987d46e320161d83429d636687e1dcfb787200
3
+ size 1394022249
entity_linker/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "overwrite":true
3
+ }
entity_linker/kb/contents ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff702a30fb3360af93dd43795238cb332814bc6a6889f2a477f4ff2d050ad9d4
3
+ size 11649540
transformer/model/sentencepiece.bpe.model → entity_linker/kb/strings.json RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
- size 5069051
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec97f460233cd7e225ffce56c87cfe570210c27ccdb209855f13ebba48c7a1bf
3
+ size 544073
entity_linker/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4676186143a40b26ea1e023fda43b9ef083c68f4f32354253f132dd8b6151209
3
+ size 3510902
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"da",
3
  "name":"dacy_large_trf",
4
- "version":"0.1.0",
5
- "description":"\n<a href=\"https://github.com/centre-for-humanities-computing/Dacy\"><img src=\"https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png\" width=\"175\" height=\"175\" align=\"right\" /></a>\n\n# DaCy large transformer\n\nDaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.\nDaCy's largest pipeline has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency \nparsing for Danish on the DaNE dataset. Check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results. \nDaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.\n ",
6
- "author":"Centre for Humanities Computing Aarhus",
7
  "email":"[email protected]",
8
  "url":"https://chcaa.io/#/",
9
- "license":"Apache-2.0 License",
10
- "spacy_version":">=3.1.1,<3.2.0",
11
- "spacy_git_version":"ffaead8fe",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
@@ -18,6 +18,25 @@
18
  "labels":{
19
  "transformer":[
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ],
22
  "morphologizer":[
23
  "AdpType=Prep|POS=ADP",
@@ -34,155 +53,157 @@
34
  "Degree=Pos|Number=Plur|POS=ADJ",
35
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN",
36
  "POS=PUNCT",
 
37
  "POS=CCONJ",
38
- "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ",
39
- "Degree=Cmp|POS=ADJ",
40
- "POS=PRON|PartType=Inf",
41
- "Gender=Com|Number=Sing|POS=DET|PronType=Ind",
42
- "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ",
43
- "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs",
44
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN",
45
- "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ",
46
- "Gender=Neut|Number=Sing|POS=DET|PronType=Dem",
 
47
  "Degree=Pos|POS=ADV",
48
- "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part",
49
- "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN",
50
- "POS=PRON|PronType=Dem",
51
- "NumType=Card|POS=NUM",
52
- "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ",
53
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs",
54
- "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ",
55
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs",
56
- "NumType=Ord|POS=ADJ",
57
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
58
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act",
59
- "POS=VERB|VerbForm=Inf|Voice=Act",
 
60
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act",
61
- "POS=NOUN",
62
- "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass",
63
  "POS=ADP|PartType=Inf",
 
 
64
  "Degree=Pos|POS=ADJ",
 
 
 
65
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN",
66
- "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs",
 
 
 
67
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN",
 
 
68
  "POS=AUX|VerbForm=Inf|Voice=Act",
69
- "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ",
 
 
 
 
 
 
 
70
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem",
71
- "Number=Plur|POS=DET|PronType=Ind",
72
- "Gender=Com|Number=Sing|POS=PRON|PronType=Ind",
73
- "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes",
74
- "POS=PART|PartType=Inf",
 
 
 
 
 
 
 
 
 
75
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind",
76
- "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs",
77
- "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN",
78
- "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs",
 
 
 
 
 
 
79
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs",
80
- "Case=Nom|Gender=Com|POS=PRON|PronType=Ind",
81
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind",
82
- "Mood=Imp|POS=VERB",
83
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
84
- "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part",
85
- "POS=X",
 
 
 
 
 
 
 
86
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs",
 
 
 
 
 
 
 
 
 
 
87
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN",
88
- "POS=VERB|Tense=Pres|VerbForm=Part",
89
- "Number=Plur|POS=PRON|PronType=Int,Rel",
90
- "POS=VERB|VerbForm=Inf|Voice=Pass",
91
- "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN",
92
- "Degree=Cmp|POS=ADV",
93
- "POS=ADV|PartType=Inf",
94
- "Degree=Sup|POS=ADV",
95
  "Number=Plur|POS=PRON|PronType=Dem",
96
- "Number=Plur|POS=PRON|PronType=Ind",
97
- "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN",
98
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs",
99
- "Case=Gen|POS=PROPN",
100
- "POS=ADP",
101
  "Degree=Cmp|Number=Plur|POS=ADJ",
102
- "Definite=Def|Degree=Sup|POS=ADJ",
103
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
104
- "Degree=Pos|Number=Sing|POS=ADJ",
105
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
106
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  "Number=Plur|POS=PRON|PronType=Rcp",
 
108
  "Case=Gen|Degree=Cmp|POS=ADJ",
109
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN",
110
- "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs",
111
- "POS=INTJ",
112
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
113
- "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ",
114
- "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
115
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs",
116
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
117
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN",
118
- "Number=Sing|POS=PRON|PronType=Int,Rel",
119
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
120
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel",
121
- "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ",
122
- "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs",
123
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
124
- "Definite=Ind|Number=Sing|POS=NOUN",
125
- "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part",
126
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
127
- "POS=SYM",
128
- "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs",
129
- "Degree=Sup|POS=ADJ",
130
- "Number=Plur|POS=DET|PronType=Ind|Style=Arch",
131
- "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Dem",
132
- "Foreign=Yes|POS=X",
133
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs",
134
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem",
135
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs",
136
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN",
137
- "Case=Gen|POS=PRON|PronType=Int,Rel",
138
- "Gender=Com|Number=Sing|POS=PRON|PronType=Dem",
139
- "Abbr=Yes|POS=X",
140
- "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN",
141
  "Definite=Def|Degree=Abs|POS=ADJ",
142
- "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ",
143
- "Definite=Ind|POS=NOUN",
144
- "Gender=Com|Number=Plur|POS=NOUN",
145
- "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs",
146
- "Gender=Com|POS=PRON|PronType=Int,Rel",
147
- "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs",
148
  "Degree=Abs|POS=ADV",
149
- "POS=VERB|VerbForm=Ger",
150
- "POS=VERB|Tense=Past|VerbForm=Part",
151
- "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ",
152
- "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form",
153
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ",
154
- "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ",
155
- "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs",
156
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel",
157
- "POS=VERB|Tense=Pres",
158
- "Case=Gen|Number=Plur|POS=DET|PronType=Ind",
159
- "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs",
160
- "POS=PRON|Person=2|Polite=Form|Poss=Yes|PronType=Prs",
161
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
162
- "POS=AUX|Tense=Pres|VerbForm=Part",
163
- "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass",
164
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
165
- "Degree=Sup|Number=Plur|POS=ADJ",
166
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs",
167
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
168
- "Definite=Ind|Number=Plur|POS=NOUN",
169
- "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part",
170
- "Mood=Imp|POS=AUX",
171
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs",
172
- "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs",
173
- "Definite=Def|Gender=Com|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part",
174
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
 
 
175
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind",
 
 
176
  "Case=Gen|POS=NOUN",
177
- "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs",
178
- "POS=DET|PronType=Dem",
179
- "Definite=Def|Number=Plur|POS=NOUN"
180
  ],
181
  "parser":[
182
  "ROOT",
183
  "acl:relcl",
184
  "advcl",
185
  "advmod",
 
186
  "amod",
187
  "appos",
188
  "aux",
@@ -206,144 +227,162 @@
206
  "nummod",
207
  "obj",
208
  "obl",
209
- "obl:loc",
210
  "obl:tmod",
211
  "punct",
212
  "xcomp"
213
- ],
214
- "attribute_ruler":[
215
-
216
- ],
217
- "lemmatizer":[
218
-
219
  ],
220
  "ner":[
221
  "LOC",
222
  "MISC",
223
  "ORG",
224
  "PER"
 
 
 
 
 
 
 
 
 
225
  ]
226
  },
227
  "pipeline":[
228
  "transformer",
 
229
  "morphologizer",
 
230
  "parser",
231
- "attribute_ruler",
232
- "lemmatizer",
233
- "ner"
 
 
234
  ],
235
  "components":[
236
  "transformer",
 
237
  "morphologizer",
 
238
  "parser",
239
- "attribute_ruler",
240
- "lemmatizer",
241
- "ner"
 
 
242
  ],
243
  "disabled":[
244
 
245
  ],
246
- "_sourced_vectors_hashes":{
247
-
248
- },
 
249
  "performance":{
250
- "pos_acc":0.9869714729,
251
- "morph_acc":0.984937279,
 
 
 
 
 
 
 
 
 
 
 
252
  "morph_per_feat":{
253
- "Mood":{
254
- "p":0.9942802669,
255
- "r":0.9942802669,
256
- "f":0.9942802669
257
- },
258
- "Tense":{
259
- "p":0.9924471299,
260
- "r":0.9894578313,
261
- "f":0.9909502262
262
  },
263
- "VerbForm":{
264
- "p":0.99200492,
265
- "r":0.9871481028,
266
- "f":0.9895705521
267
  },
268
- "Voice":{
269
- "p":0.9955123411,
270
- "r":0.9947683109,
271
- "f":0.9951401869
272
  },
273
  "Definite":{
274
- "p":0.9916666667,
275
- "r":0.987356776,
276
- "f":0.9895070283
277
  },
278
  "Gender":{
279
- "p":0.989656323,
280
- "r":0.9857095381,
281
- "f":0.9876789877
282
  },
283
- "Number":{
284
- "p":0.9921486522,
285
- "r":0.9887845592,
286
- "f":0.9904637492
287
  },
288
- "AdpType":{
289
- "p":1.0,
290
- "r":0.9946949602,
291
- "f":0.9973404255
292
  },
293
- "PartType":{
 
 
 
 
 
 
 
 
 
 
294
  "p":1.0,
295
  "r":1.0,
296
  "f":1.0
297
  },
298
- "Case":{
299
- "p":0.9905362776,
300
- "r":0.9921011058,
301
- "f":0.9913180742
302
- },
303
- "Person":{
304
- "p":0.9911816578,
305
- "r":0.9982238011,
306
- "f":0.9946902655
307
- },
308
  "PronType":{
309
- "p":0.9934210526,
310
- "r":0.9934210526,
311
- "f":0.9934210526
312
  },
313
- "NumType":{
314
- "p":0.9931506849,
315
- "r":0.9602649007,
316
- "f":0.9764309764
317
  },
318
- "Degree":{
319
- "p":0.980861244,
320
- "r":0.9879518072,
321
- "f":0.9843937575
322
  },
323
- "Reflex":{
324
  "p":1.0,
325
  "r":1.0,
326
  "f":1.0
327
  },
328
- "Number[psor]":{
329
  "p":1.0,
330
  "r":1.0,
331
  "f":1.0
332
  },
333
- "Poss":{
 
 
 
 
 
334
  "p":1.0,
335
  "r":1.0,
336
  "f":1.0
337
  },
338
  "Foreign":{
339
- "p":0.875,
340
- "r":0.7,
341
- "f":0.7777777778
342
- },
343
- "Abbr":{
344
- "p":1.0,
345
- "r":0.4,
346
- "f":0.5714285714
347
  },
348
  "Style":{
349
  "p":1.0,
@@ -352,230 +391,295 @@
352
  },
353
  "Polite":{
354
  "p":1.0,
355
- "r":1.0,
356
- "f":1.0
 
 
 
 
 
357
  }
358
  },
359
- "dep_uas":0.9074560179,
360
- "dep_las":0.8837754817,
361
  "dep_las_per_type":{
362
- "advmod":{
363
- "p":0.8407821229,
364
- "r":0.8502824859,
365
- "f":0.845505618
366
  },
367
- "root":{
368
- "p":0.9359430605,
369
- "r":0.9326241135,
370
- "f":0.9342806394
371
  },
372
- "nsubj":{
373
- "p":0.9429175476,
374
- "r":0.94092827,
375
- "f":0.9419218585
376
  },
377
- "case":{
378
- "p":0.9418837675,
379
- "r":0.9288537549,
380
- "f":0.9353233831
381
  },
382
- "obl":{
383
- "p":0.8398133748,
384
- "r":0.8398133748,
385
- "f":0.8398133748
386
  },
387
  "cc":{
388
- "p":0.9041916168,
389
- "r":0.8779069767,
390
- "f":0.8908554572
391
  },
392
  "conj":{
393
- "p":0.8244680851,
394
- "r":0.8266666667,
395
- "f":0.8255659121
396
  },
397
- "obj":{
398
- "p":0.933460076,
399
- "r":0.9533980583,
400
- "f":0.9433237272
 
 
 
 
 
 
 
 
 
 
401
  },
402
  "aux":{
403
- "p":0.9298245614,
404
- "r":0.9271137026,
405
- "f":0.9284671533
406
  },
407
- "acl:relcl":{
408
- "p":0.8206521739,
409
- "r":0.8162162162,
410
- "f":0.8184281843
411
  },
412
- "obl:loc":{
413
- "p":0.7808219178,
414
- "r":0.8142857143,
415
- "f":0.7972027972
416
  },
417
  "det":{
418
- "p":0.9488448845,
419
- "r":0.9472817133,
420
- "f":0.9480626546
421
  },
422
- "amod":{
423
- "p":0.9011925043,
424
- "r":0.9027303754,
425
- "f":0.9019607843
426
  },
427
  "nmod:poss":{
428
- "p":0.7524752475,
429
- "r":0.7524752475,
430
- "f":0.7524752475
431
  },
432
- "ccomp":{
433
- "p":0.7727272727,
434
- "r":0.8225806452,
435
- "f":0.796875
436
- },
437
- "nummod":{
438
- "p":0.832,
439
- "r":0.8666666667,
440
- "f":0.8489795918
441
  },
442
- "flat":{
443
- "p":0.8674698795,
444
- "r":0.9536423841,
445
- "f":0.9085173502
446
  },
447
- "compound:prt":{
448
- "p":0.75,
449
- "r":0.512195122,
450
- "f":0.6086956522
451
  },
452
  "advcl":{
453
- "p":0.7818181818,
454
- "r":0.7413793103,
455
- "f":0.7610619469
456
- },
457
- "mark":{
458
- "p":0.9354166667,
459
- "r":0.9219712526,
460
- "f":0.9286452947
461
  },
462
- "cop":{
463
- "p":0.9314285714,
464
- "r":0.9314285714,
465
- "f":0.9314285714
466
  },
467
  "dep":{
468
- "p":0.2564102564,
469
- "r":0.5660377358,
470
- "f":0.3529411765
471
  },
472
- "nmod":{
473
- "p":0.7987679671,
474
- "r":0.759765625,
475
- "f":0.7787787788
476
  },
477
  "iobj":{
478
- "p":1.0,
479
- "r":0.8181818182,
480
- "f":0.9
 
 
 
 
 
 
 
 
 
 
481
  },
482
  "xcomp":{
483
- "p":0.8695652174,
484
- "r":0.6779661017,
485
- "f":0.7619047619
 
 
 
 
 
 
 
 
 
 
486
  },
487
  "list":{
488
- "p":0.7,
489
- "r":0.3888888889,
490
- "f":0.5
491
  },
492
- "vocative":{
 
 
 
 
 
493
  "p":0.0,
494
  "r":0.0,
495
  "f":0.0
496
  },
497
- "fixed":{
498
- "p":0.9487179487,
499
- "r":0.880952381,
500
- "f":0.9135802469
501
  },
502
- "expl":{
503
- "p":0.96875,
504
- "r":0.9117647059,
505
- "f":0.9393939394
506
  },
507
- "appos":{
508
- "p":0.7428571429,
509
- "r":0.7878787879,
510
- "f":0.7647058824
511
  },
512
- "obl:tmod":{
513
- "p":1.0,
514
- "r":0.3888888889,
515
- "f":0.56
516
  },
517
- "discourse":{
518
  "p":0.0,
519
  "r":0.0,
520
  "f":0.0
521
  }
522
  },
523
- "sents_p":0.9608540925,
524
- "sents_r":0.9574468085,
525
- "sents_f":0.9591474245,
526
- "lemma_acc":0.8491041162,
527
- "ents_f":0.9012345679,
528
- "ents_p":0.8902439024,
529
- "ents_r":0.9125,
530
  "ents_per_type":{
 
 
 
 
 
531
  "PER":{
532
- "p":0.9620253165,
533
- "r":0.9156626506,
534
- "f":0.9382716049
535
  },
536
  "ORG":{
537
- "p":0.9047619048,
538
- "r":0.8444444444,
539
- "f":0.8735632184
540
  },
541
  "MISC":{
542
- "p":0.7555555556,
543
- "r":0.9026548673,
544
- "f":0.8225806452
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
545
  },
546
  "LOC":{
547
- "p":0.9391304348,
548
- "r":0.972972973,
549
- "f":0.9557522124
 
 
 
 
 
550
  }
551
- },
552
- "transformer_loss":18056.2649203666,
553
- "morphologizer_loss":1117.3585804609,
554
- "parser_loss":80374.9126560178,
555
- "ner_loss":166.3446418822
556
  },
557
  "sources":[
558
  {
559
- "name":"UD Danish DDT v2.5",
560
  "url":"https://github.com/UniversalDependencies/UD_Danish-DDT",
561
  "license":"CC BY-SA 4.0",
562
  "author":"Johannsen, Anders; Mart\u00ednez Alonso, H\u00e9ctor; Plank, Barbara"
563
  },
564
  {
565
  "name":"DaNE",
566
- "url":"https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane",
567
  "license":"CC BY-SA 4.0",
568
  "author":"Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders S\u00f8gaard"
569
  },
570
  {
571
- "name":"xlm-roberta-large",
572
- "author":"Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm\u00e1n, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov",
573
- "url":"https://huggingface.co/xlm-roberta-large",
 
 
 
 
 
 
 
 
 
 
 
 
574
  "license":"CC BY 4.0"
575
  }
576
  ],
577
- "requirements":[
578
- "spacy-transformers>=1.0.3,<1.1.0"
579
- ],
580
- "notes":"\n## Bias and Robustness\n\nBesides the validation done by SpaCy on the DaNE testset, DaCy also provides a series of augmentations to the DaNE test set to see how well the models deal with these types of augmentations.\nThe can be seen as behavioural probes akinn to the NLP checklist.\n\n### Deterministic Augmentations\nDeterministic augmentations are augmentation which always yield the same result.\n\n| Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) |\u00a0Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |\n| --- | --- | --- | --- | --- | --- | --- | --- |\n| No augmentation | 0.985 | 0.979 | 0.906 | 0.881 | 0.986 | 0.844 | 0.839 |\n| \u00c6\u00f8\u00e5 Augmentation | 0.973 | 0.963 | 0.892 | 0.863 | 0.975 | 0.754 | 0.815 |\n| Lowercase | 0.981 | 0.975 | 0.902 | 0.876 | 0.93 | 0.848 | 0.788 |\n| No Spacing | 0.227 | 0.229 | 0.004 | 0.004 | 0.54 | 0.225 | 0.086 |\n| Abbreviated first names | 0.984 | 0.978 | 0.903 | 0.878 | 0.986 | 0.845 | 0.839 |\n| Input size augmentation 5 sentences | 0.986 | 0.981 | 0.904 | 0.88 | 0.97 | 0.844 | 0.847 |\n| Input size augmentation 10 sentences | 0.986 | 0.981 | 0.905 | 0.881 | 0.964 | 0.844 | 0.849 |\n\n\n\n### Stochastic Augmentations\nStochastic augmentations are augmentation which are repeated mulitple times to estimate the effect of the augmentation.\n\n| Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) |\u00a0Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |\n| --- | --- | --- | --- | --- | --- | --- | --- |\n| Keystroke errors 2% | 0.949 (0.002) | 0.944 (0.002) | 0.868 (0.002) | 0.833 (0.002) | 0.965 (0.002) | 0.773 (0.002) | 0.775 (0.002) |\n| Keystroke errors 5% | 0.895 (0.003) | 0.893 (0.003) | 0.81 (0.003) | 0.76 (0.003) | 0.92 (0.003) | 0.68 (0.003) | 0.698 (0.003) |\n| Keystroke errors 15% | 0.705 (0.005) | 0.72 (0.005) | 0.6 (0.005) | 0.518 (0.005) | 0.801 (0.005) | 0.462 (0.005) | 0.506 (0.005) |\n| Danish names | 0.984 (0.0) | 0.979 (0.0) | 0.904 (0.0) | 0.879 (0.0) | 0.987 (0.0) | 0.847 (0.0) | 0.844 (0.0) |\n| Muslim names | 0.984 (0.0) | 0.979 (0.0) | 0.904 (0.0) | 0.879 (0.0) | 0.987 (0.0) | 0.847 (0.0) | 0.844 (0.0) |\n| Female names | 0.984 (0.0) | 0.979 (0.0) | 0.904 (0.0) | 0.879 (0.0) | 0.986 (0.0) | 0.847 (0.0) | 0.846 (0.0) |\n| Male names | 0.984 (0.0) | 0.979 (0.0) | 0.904 (0.0) | 0.879 (0.0) | 0.986 (0.0) | 0.846 (0.0) | 0.845 (0.0) |\n| Spacing Augmention 5% | 0.946 (0.002) | 0.941 (0.002) | 0.794 (0.002) | 0.771 (0.002) | 0.969 (0.002) | 0.812 (0.002) | 0.781 (0.002) |\n\n<details>\n\n<summary> Description of Augmenters </summary>\n\n \n\n**No augmentation:**\nApplies no augmentation to the DaNE test set.\n\n**\u00c6\u00f8\u00e5 Augmentation:**\nThis augmentation replace the \u00e6,\u00f8, and \u00e5 with their spelling variations ae, oe and aa respectively.\n\n**Lowercase:**\nThis augmentation lowercases all text.\n\n**No Spacing:**\nThis augmentation removed all spacing from the text.\n\n**Abbreviated first names:**\nThis agmentation abbreviates the first names of entities. For instance 'Kenneth Enevoldsen' would turn to 'K. Enevoldsen'.\n\n**Keystroke errors 2%:**\nThis agmentation simulate keystroke errors by replacing 2% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Keystroke errors 5%:**\nThis agmentation simulate keystroke errors by replacing 5% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Keystroke errors 15%:**\nThis agmentation simulate keystroke errors by replacing 15% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Danish names:**\nThis agmentation replace all names with Danish names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Muslim names:**\nThis agmentation replace all names with Muslim names derived from Meldgaard (2005). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Female names:**\nThis agmentation replace all names with Danish female names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Male names:**\nThis agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Spacing Augmention 5%:**\nThis agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n </details> \n <br /> \n\n\n### Hardware\nThis was run an trained on a Quadro RTX 8000 GPU."
581
  }
 
1
  {
2
  "lang":"da",
3
  "name":"dacy_large_trf",
4
+ "version":"0.2.0",
5
+ "description":"\n<a href=\"https://github.com/centre-for-humanities-computing/Dacy\"><img src=\"https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png\" width=\"175\" height=\"175\" align=\"right\" /></a>\n\n# DaCy large\n\nDaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.\nDaCy's largest pipeline has achieved State-of-the-Art performance on parts-of-speech tagging and dependency \nparsing for Danish on the Danish Dependency treebank as well as competitive performance on named entity recognition, named entity disambiguation and coreference resolution. \nTo read more check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results. \nDaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.\n",
6
+ "author":"Kenneth Enevoldsen",
7
  "email":"[email protected]",
8
  "url":"https://chcaa.io/#/",
9
+ "license":"Apache-2.0",
10
+ "spacy_version":">=3.5.2,<3.6.0",
11
+ "spacy_git_version":"Unknown",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
 
18
  "labels":{
19
  "transformer":[
20
 
21
+ ],
22
+ "tagger":[
23
+ "ADJ",
24
+ "ADP",
25
+ "ADV",
26
+ "AUX",
27
+ "CCONJ",
28
+ "DET",
29
+ "INTJ",
30
+ "NOUN",
31
+ "NUM",
32
+ "PART",
33
+ "PRON",
34
+ "PROPN",
35
+ "PUNCT",
36
+ "SCONJ",
37
+ "SYM",
38
+ "VERB",
39
+ "X"
40
  ],
41
  "morphologizer":[
42
  "AdpType=Prep|POS=ADP",
 
53
  "Degree=Pos|Number=Plur|POS=ADJ",
54
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN",
55
  "POS=PUNCT",
56
+ "NumType=Ord|POS=ADJ",
57
  "POS=CCONJ",
 
 
 
 
 
 
58
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN",
59
+ "POS=VERB|VerbForm=Inf|Voice=Act",
60
+ "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs",
61
+ "Degree=Sup|POS=ADV",
62
  "Degree=Pos|POS=ADV",
63
+ "Gender=Com|Number=Sing|POS=DET|PronType=Ind",
64
+ "Number=Plur|POS=DET|PronType=Ind",
65
+ "POS=ADP",
66
+ "POS=ADV|PartType=Inf",
 
 
 
67
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs",
 
 
68
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act",
69
+ "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ",
70
+ "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs",
71
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act",
 
 
72
  "POS=ADP|PartType=Inf",
73
+ "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ",
74
+ "NumType=Card|POS=NUM",
75
  "Degree=Pos|POS=ADJ",
76
+ "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part",
77
+ "POS=PART|PartType=Inf",
78
+ "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes",
79
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN",
80
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN",
81
+ "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs",
82
+ "POS=VERB|Tense=Pres|VerbForm=Part",
83
+ "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs",
84
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN",
85
+ "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ",
86
+ "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs",
87
  "POS=AUX|VerbForm=Inf|Voice=Act",
88
+ "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ",
89
+ "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ",
90
+ "Degree=Cmp|POS=ADJ",
91
+ "POS=PRON|PartType=Inf",
92
+ "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ",
93
+ "Case=Nom|Gender=Com|POS=PRON|PronType=Ind",
94
+ "Number=Plur|POS=PRON|PronType=Ind",
95
+ "POS=INTJ",
96
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem",
97
+ "Case=Gen|Number=Plur|POS=DET|PronType=Ind",
98
+ "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass",
99
+ "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN",
100
+ "Degree=Cmp|POS=ADV",
101
+ "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form",
102
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs",
103
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
104
+ "Case=Gen|POS=PROPN",
105
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind",
106
+ "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part",
107
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
108
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs",
109
+ "Definite=Def|Degree=Sup|POS=ADJ",
110
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind",
111
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN",
112
+ "Gender=Neut|Number=Sing|POS=DET|PronType=Dem",
113
+ "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part",
114
+ "POS=PRON|PronType=Dem",
115
+ "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ",
116
+ "Number=Plur|POS=NUM",
117
+ "POS=VERB|VerbForm=Inf|Voice=Pass",
118
+ "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ",
119
+ "Number=Sing|POS=PRON|PronType=Int,Rel",
120
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs",
121
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
 
 
122
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
123
+ "POS=PRON",
124
+ "Definite=Ind|Number=Sing|POS=NOUN",
125
+ "Definite=Ind|Number=Sing|POS=NUM",
126
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN",
127
+ "Foreign=Yes|POS=ADV",
128
+ "POS=NOUN",
129
+ "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN",
130
+ "Gender=Com|Number=Plur|POS=NOUN",
131
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel",
132
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs",
133
+ "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs",
134
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Ind",
135
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN",
136
+ "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ",
137
+ "Degree=Sup|POS=ADJ",
138
+ "Degree=Pos|Number=Sing|POS=ADJ",
139
+ "Mood=Imp|POS=VERB",
140
+ "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs",
141
+ "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs",
142
+ "POS=X",
143
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN",
 
 
 
 
 
 
 
144
  "Number=Plur|POS=PRON|PronType=Dem",
145
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs",
146
+ "Number=Plur|POS=PRON|PronType=Int,Rel",
147
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
 
 
148
  "Degree=Cmp|Number=Plur|POS=ADJ",
149
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
 
 
 
150
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
151
+ "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs",
152
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs",
153
+ "Gender=Com|POS=PRON|PronType=Int,Rel",
154
+ "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ",
155
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
156
+ "POS=VERB|VerbForm=Ger",
157
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Dem",
158
+ "Case=Gen|POS=PRON|PronType=Int,Rel",
159
+ "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass",
160
+ "Abbr=Yes|POS=X",
161
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN",
162
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
163
+ "Definite=Ind|Number=Plur|POS=NOUN",
164
+ "Foreign=Yes|POS=X",
165
  "Number=Plur|POS=PRON|PronType=Rcp",
166
+ "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs",
167
  "Case=Gen|Degree=Cmp|POS=ADJ",
168
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN",
169
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs",
170
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem",
 
 
 
 
 
 
 
171
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
172
+ "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
 
 
 
 
 
173
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
174
+ "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs",
175
+ "Case=Gen|Number=Plur|POS=PRON|PronType=Rcp",
 
 
 
 
176
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs",
177
+ "POS=SYM",
178
+ "POS=DET|PronType=Dem",
179
+ "Gender=Com|Number=Sing|POS=NUM",
180
+ "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs",
181
+ "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part",
 
 
182
  "Definite=Def|Degree=Abs|POS=ADJ",
183
+ "POS=VERB|Tense=Pres",
184
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NUM",
 
 
 
 
185
  "Degree=Abs|POS=ADV",
 
 
 
 
186
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ",
 
 
187
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel",
188
+ "POS=VERB|Tense=Past|VerbForm=Part",
189
+ "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ",
 
 
190
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
 
 
 
 
 
 
 
 
 
191
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs",
 
 
192
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
193
+ "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs",
194
+ "Definite=Ind|POS=NOUN",
195
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind",
196
+ "Definite=Ind|Gender=Com|Number=Sing|POS=NUM",
197
+ "Definite=Def|Number=Plur|POS=NOUN",
198
  "Case=Gen|POS=NOUN",
199
+ "POS=AUX|Tense=Pres|VerbForm=Part"
 
 
200
  ],
201
  "parser":[
202
  "ROOT",
203
  "acl:relcl",
204
  "advcl",
205
  "advmod",
206
+ "advmod:lmod",
207
  "amod",
208
  "appos",
209
  "aux",
 
227
  "nummod",
228
  "obj",
229
  "obl",
230
+ "obl:lmod",
231
  "obl:tmod",
232
  "punct",
233
  "xcomp"
 
 
 
 
 
 
234
  ],
235
  "ner":[
236
  "LOC",
237
  "MISC",
238
  "ORG",
239
  "PER"
240
+ ],
241
+ "coref":[
242
+
243
+ ],
244
+ "span_resolver":[
245
+
246
+ ],
247
+ "entity_linker":[
248
+
249
  ]
250
  },
251
  "pipeline":[
252
  "transformer",
253
+ "tagger",
254
  "morphologizer",
255
+ "trainable_lemmatizer",
256
  "parser",
257
+ "ner",
258
+ "coref",
259
+ "span_resolver",
260
+ "span_cleaner",
261
+ "entity_linker"
262
  ],
263
  "components":[
264
  "transformer",
265
+ "tagger",
266
  "morphologizer",
267
+ "trainable_lemmatizer",
268
  "parser",
269
+ "ner",
270
+ "coref",
271
+ "span_resolver",
272
+ "span_cleaner",
273
+ "entity_linker"
274
  ],
275
  "disabled":[
276
 
277
  ],
278
+ "requirements":[
279
+ "spacy-experimental>=0.6.2,<0.7.0",
280
+ "spacy-transformers>=1.2.3,<1.3.0"
281
+ ],
282
  "performance":{
283
+ "token_acc":0.9992023928,
284
+ "token_p":0.9970089731,
285
+ "token_r":0.9977052779,
286
+ "token_f":0.9973570039,
287
+ "sents_p":1.0,
288
+ "sents_r":1.0,
289
+ "sents_f":1.0,
290
+ "tag_acc":0.9913668347,
291
+ "pos_acc":0.9908174469,
292
+ "morph_acc":0.9880227568,
293
+ "morph_micro_p":0.9945294243,
294
+ "morph_micro_r":0.9932106296,
295
+ "morph_micro_f":0.9938695894,
296
  "morph_per_feat":{
297
+ "NumType":{
298
+ "p":0.9826589595,
299
+ "r":0.988372093,
300
+ "f":0.9855072464
 
 
 
 
 
301
  },
302
+ "Degree":{
303
+ "p":0.9973753281,
304
+ "r":0.9819121447,
305
+ "f":0.9895833333
306
  },
307
+ "Number":{
308
+ "p":0.991821771,
309
+ "r":0.9912626832,
310
+ "f":0.9915421483
311
  },
312
  "Definite":{
313
+ "p":0.9910141207,
314
+ "r":0.9910141207,
315
+ "f":0.9910141207
316
  },
317
  "Gender":{
318
+ "p":0.9905329593,
319
+ "r":0.9901857694,
320
+ "f":0.9903593339
321
  },
322
+ "Mood":{
323
+ "p":0.9990393852,
324
+ "r":0.9980806142,
325
+ "f":0.9985597696
326
  },
327
+ "Tense":{
328
+ "p":0.9953343701,
329
+ "r":0.9976617303,
330
+ "f":0.9964966913
331
  },
332
+ "VerbForm":{
333
+ "p":0.9956140351,
334
+ "r":0.9968632371,
335
+ "f":0.9962382445
336
+ },
337
+ "Voice":{
338
+ "p":0.9970149254,
339
+ "r":0.9962714392,
340
+ "f":0.9966430436
341
+ },
342
+ "AdpType":{
343
  "p":1.0,
344
  "r":1.0,
345
  "f":1.0
346
  },
 
 
 
 
 
 
 
 
 
 
347
  "PronType":{
348
+ "p":0.9990966576,
349
+ "r":0.9981949458,
350
+ "f":0.9986455982
351
  },
352
+ "Case":{
353
+ "p":1.0,
354
+ "r":0.992248062,
355
+ "f":0.9961089494
356
  },
357
+ "Person":{
358
+ "p":0.9982638889,
359
+ "r":0.9965337955,
360
+ "f":0.9973980919
361
  },
362
+ "Number[psor]":{
363
  "p":1.0,
364
  "r":1.0,
365
  "f":1.0
366
  },
367
+ "Poss":{
368
  "p":1.0,
369
  "r":1.0,
370
  "f":1.0
371
  },
372
+ "PartType":{
373
+ "p":0.9962406015,
374
+ "r":0.9962406015,
375
+ "f":0.9962406015
376
+ },
377
+ "Reflex":{
378
  "p":1.0,
379
  "r":1.0,
380
  "f":1.0
381
  },
382
  "Foreign":{
383
+ "p":0.0,
384
+ "r":0.0,
385
+ "f":0.0
 
 
 
 
 
386
  },
387
  "Style":{
388
  "p":1.0,
 
391
  },
392
  "Polite":{
393
  "p":1.0,
394
+ "r":0.6666666667,
395
+ "f":0.8
396
+ },
397
+ "Abbr":{
398
+ "p":1.0,
399
+ "r":0.5,
400
+ "f":0.6666666667
401
  }
402
  },
403
+ "dep_uas":0.9280885781,
404
+ "dep_las":0.9079997669,
405
  "dep_las_per_type":{
406
+ "nummod":{
407
+ "p":0.8738738739,
408
+ "r":0.8584070796,
409
+ "f":0.8660714286
410
  },
411
+ "amod":{
412
+ "p":0.9130434783,
413
+ "r":0.9247706422,
414
+ "f":0.9188696445
415
  },
416
+ "nmod":{
417
+ "p":0.8213507625,
418
+ "r":0.8231441048,
419
+ "f":0.8222464558
420
  },
421
+ "nsubj":{
422
+ "p":0.9587737844,
423
+ "r":0.9597883598,
424
+ "f":0.9592808038
425
  },
426
+ "flat":{
427
+ "p":0.9672131148,
428
+ "r":0.9414893617,
429
+ "f":0.9541778976
430
  },
431
  "cc":{
432
+ "p":0.9019607843,
433
+ "r":0.9139072848,
434
+ "f":0.9078947368
435
  },
436
  "conj":{
437
+ "p":0.8904899135,
438
+ "r":0.8930635838,
439
+ "f":0.8917748918
440
  },
441
+ "root":{
442
+ "p":0.9468085106,
443
+ "r":0.9451327434,
444
+ "f":0.9459698849
445
+ },
446
+ "advmod":{
447
+ "p":0.9056316591,
448
+ "r":0.892053973,
449
+ "f":0.8987915408
450
+ },
451
+ "mark":{
452
+ "p":0.9572072072,
453
+ "r":0.9465478842,
454
+ "f":0.9518477044
455
  },
456
  "aux":{
457
+ "p":0.9782608696,
458
+ "r":0.9692307692,
459
+ "f":0.9737248841
460
  },
461
+ "ccomp":{
462
+ "p":0.7931034483,
463
+ "r":0.8734177215,
464
+ "f":0.8313253012
465
  },
466
+ "case":{
467
+ "p":0.9511677282,
468
+ "r":0.9401888772,
469
+ "f":0.945646438
470
  },
471
  "det":{
472
+ "p":0.96,
473
+ "r":0.9677419355,
474
+ "f":0.9638554217
475
  },
476
+ "obl":{
477
+ "p":0.8901453958,
478
+ "r":0.8732171157,
479
+ "f":0.8816
480
  },
481
  "nmod:poss":{
482
+ "p":0.8245614035,
483
+ "r":0.8623853211,
484
+ "f":0.8430493274
485
  },
486
+ "obj":{
487
+ "p":0.9362101313,
488
+ "r":0.9504761905,
489
+ "f":0.943289225
 
 
 
 
 
490
  },
491
+ "cop":{
492
+ "p":0.9036144578,
493
+ "r":0.9202453988,
494
+ "f":0.9118541033
495
  },
496
+ "acl:relcl":{
497
+ "p":0.8554913295,
498
+ "r":0.8087431694,
499
+ "f":0.8314606742
500
  },
501
  "advcl":{
502
+ "p":0.754601227,
503
+ "r":0.7884615385,
504
+ "f":0.7711598746
 
 
 
 
 
505
  },
506
+ "compound:prt":{
507
+ "p":0.7,
508
+ "r":0.6176470588,
509
+ "f":0.65625
510
  },
511
  "dep":{
512
+ "p":0.1368421053,
513
+ "r":0.4333333333,
514
+ "f":0.208
515
  },
516
+ "fixed":{
517
+ "p":0.9655172414,
518
+ "r":0.9032258065,
519
+ "f":0.9333333333
520
  },
521
  "iobj":{
522
+ "p":0.9230769231,
523
+ "r":0.8,
524
+ "f":0.8571428571
525
+ },
526
+ "appos":{
527
+ "p":0.8,
528
+ "r":0.7368421053,
529
+ "f":0.7671232877
530
+ },
531
+ "obl:tmod":{
532
+ "p":0.75,
533
+ "r":0.375,
534
+ "f":0.5
535
  },
536
  "xcomp":{
537
+ "p":0.92,
538
+ "r":0.71875,
539
+ "f":0.8070175439
540
+ },
541
+ "advmod:lmod":{
542
+ "p":0.8775510204,
543
+ "r":0.8958333333,
544
+ "f":0.8865979381
545
+ },
546
+ "expl":{
547
+ "p":0.972972973,
548
+ "r":0.9230769231,
549
+ "f":0.9473684211
550
  },
551
  "list":{
552
+ "p":0.3636363636,
553
+ "r":0.2352941176,
554
+ "f":0.2857142857
555
  },
556
+ "obl:lmod":{
557
+ "p":0.5,
558
+ "r":0.3333333333,
559
+ "f":0.4
560
+ },
561
+ "parataxis":{
562
  "p":0.0,
563
  "r":0.0,
564
  "f":0.0
565
  },
566
+ "orphan":{
567
+ "p":0.0,
568
+ "r":0.0,
569
+ "f":0.0
570
  },
571
+ "vocative":{
572
+ "p":0.0,
573
+ "r":0.0,
574
+ "f":0.0
575
  },
576
+ "discourse":{
577
+ "p":0.0,
578
+ "r":0.0,
579
+ "f":0.0
580
  },
581
+ "dislocated":{
582
+ "p":0.0,
583
+ "r":0.0,
584
+ "f":0.0
585
  },
586
+ "compound":{
587
  "p":0.0,
588
  "r":0.0,
589
  "f":0.0
590
  }
591
  },
592
+ "ents_p":0.8858195212,
593
+ "ents_r":0.8620071685,
594
+ "ents_f":0.8737511353,
 
 
 
 
595
  "ents_per_type":{
596
+ "LOC":{
597
+ "p":0.8613861386,
598
+ "r":0.90625,
599
+ "f":0.883248731
600
+ },
601
  "PER":{
602
+ "p":0.9550561798,
603
+ "r":0.9444444444,
604
+ "f":0.9497206704
605
  },
606
  "ORG":{
607
+ "p":0.8819444444,
608
+ "r":0.7888198758,
609
+ "f":0.8327868852
610
  },
611
  "MISC":{
612
+ "p":0.8083333333,
613
+ "r":0.8016528926,
614
+ "f":0.8049792531
615
+ }
616
+ },
617
+ "lemma_acc":0.9589423796,
618
+ "coref_lea_f1":0.4672143289,
619
+ "coref_lea_precision":0.4590991705,
620
+ "coref_lea_recall":0.4756215411,
621
+ "nel_score":0.3428571429,
622
+ "nel_score_desc":"micro F",
623
+ "nel_micro_p":0.84,
624
+ "nel_micro_r":0.2153846154,
625
+ "nel_micro_f":0.3428571429,
626
+ "nel_macro_p":0.8670634921,
627
+ "nel_macro_r":0.2470462544,
628
+ "nel_macro_f":0.3727980563,
629
+ "nel_f_per_type":{
630
+ "MISC":{
631
+ "p":1.0,
632
+ "r":0.2777777778,
633
+ "f":0.4347826087
634
+ },
635
+ "PER":{
636
+ "p":0.8571428571,
637
+ "r":0.1,
638
+ "f":0.1791044776
639
  },
640
  "LOC":{
641
+ "p":1.0,
642
+ "r":0.4411764706,
643
+ "f":0.612244898
644
+ },
645
+ "ORG":{
646
+ "p":0.6111111111,
647
+ "r":0.1692307692,
648
+ "f":0.265060241
649
  }
650
+ }
 
 
 
 
651
  },
652
  "sources":[
653
  {
654
+ "name":"UD Danish DDT v2.11",
655
  "url":"https://github.com/UniversalDependencies/UD_Danish-DDT",
656
  "license":"CC BY-SA 4.0",
657
  "author":"Johannsen, Anders; Mart\u00ednez Alonso, H\u00e9ctor; Plank, Barbara"
658
  },
659
  {
660
  "name":"DaNE",
661
+ "url":"https://huggingface.co/datasets/dane",
662
  "license":"CC BY-SA 4.0",
663
  "author":"Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders S\u00f8gaard"
664
  },
665
  {
666
+ "name":"DaCoref",
667
+ "url":"https://huggingface.co/datasets/alexandrainst/dacoref",
668
+ "license":"CC BY-SA 4.0",
669
+ "author":"Buch-Kromann, Matthias"
670
+ },
671
+ {
672
+ "name":"DaNED",
673
+ "url":"https://danlp-alexandra.readthedocs.io/en/stable/docs/datasets.html#daned",
674
+ "license":"CC BY-SA 4.0",
675
+ "author":"Barrett, M. J., Lam, H., Wu, M., Lacroix, O., Plank, B., & S\u00f8gaard, A."
676
+ },
677
+ {
678
+ "name":"chcaa/dfm-encoder-large-v1",
679
+ "author":"The Danish Foundation Models team",
680
+ "url":"https://huggingface.co/chcaa/dfm-encoder-large-v1",
681
  "license":"CC BY 4.0"
682
  }
683
  ],
684
+ "notes":"\n\n### Training\nThis model was trained using [spaCy](https://spacy.io) and logged to [Weights & Biases](https://wandb.ai/kenevoldsen/dacy-v0.2.0). You can find all the training logs [here](https://wandb.ai/kenevoldsen/dacy-v0.2.0)."
 
 
 
685
  }
morphologizer/cfg CHANGED
@@ -1,4 +1,5 @@
1
  {
 
2
  "labels_morph":{
3
  "AdpType=Prep|POS=ADP":"AdpType=Prep",
4
  "Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":"Definite=Ind|Gender=Com|Number=Sing",
@@ -14,149 +15,150 @@
14
  "Degree=Pos|Number=Plur|POS=ADJ":"Degree=Pos|Number=Plur",
15
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":"Definite=Ind|Gender=Com|Number=Plur",
16
  "POS=PUNCT":"",
 
17
  "POS=CCONJ":"",
18
- "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Cmp|Number=Sing",
19
- "Degree=Cmp|POS=ADJ":"Degree=Cmp",
20
- "POS=PRON|PartType=Inf":"PartType=Inf",
21
- "Gender=Com|Number=Sing|POS=DET|PronType=Ind":"Gender=Com|Number=Sing|PronType=Ind",
22
- "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Number=Sing",
23
- "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs",
24
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":"Definite=Ind|Gender=Neut|Number=Plur",
25
- "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":"Definite=Def|Degree=Pos|Number=Sing",
26
- "Gender=Neut|Number=Sing|POS=DET|PronType=Dem":"Gender=Neut|Number=Sing|PronType=Dem",
 
27
  "Degree=Pos|POS=ADV":"Degree=Pos",
28
- "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":"Definite=Def|Number=Sing|Tense=Past|VerbForm=Part",
29
- "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":"Definite=Ind|Gender=Neut|Number=Sing",
30
- "POS=PRON|PronType=Dem":"PronType=Dem",
31
- "NumType=Card|POS=NUM":"NumType=Card",
32
- "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing",
33
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs",
34
- "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":"Degree=Pos|Gender=Com|Number=Sing",
35
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=3|PronType=Prs",
36
- "NumType=Ord|POS=ADJ":"NumType=Ord",
37
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
38
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Act",
39
- "POS=VERB|VerbForm=Inf|Voice=Act":"VerbForm=Inf|Voice=Act",
 
40
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Act",
41
- "POS=NOUN":"",
42
- "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass":"Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Pass",
43
  "POS=ADP|PartType=Inf":"PartType=Inf",
 
 
44
  "Degree=Pos|POS=ADJ":"Degree=Pos",
 
 
 
45
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN":"Definite=Def|Gender=Com|Number=Plur",
46
- "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs",
 
 
 
47
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN":"Case=Gen|Definite=Def|Gender=Com|Number=Sing",
 
 
48
  "POS=AUX|VerbForm=Inf|Voice=Act":"VerbForm=Inf|Voice=Act",
49
- "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Gender=Com|Number=Sing",
 
 
 
 
 
 
 
50
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem":"Gender=Com|Number=Sing|PronType=Dem",
51
- "Number=Plur|POS=DET|PronType=Ind":"Number=Plur|PronType=Ind",
52
- "Gender=Com|Number=Sing|POS=PRON|PronType=Ind":"Gender=Com|Number=Sing|PronType=Ind",
53
- "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes":"Case=Acc|Person=3|PronType=Prs|Reflex=Yes",
54
- "POS=PART|PartType=Inf":"PartType=Inf",
 
 
 
 
 
 
 
 
 
55
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind":"Gender=Neut|Number=Sing|PronType=Ind",
56
- "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Number=Plur|Person=3|PronType=Prs",
57
- "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN":"Case=Gen|Definite=Def|Gender=Neut|Number=Sing",
58
- "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs":"Case=Nom|Number=Plur|Person=3|PronType=Prs",
 
 
 
 
 
 
59
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=1|PronType=Prs",
60
- "Case=Nom|Gender=Com|POS=PRON|PronType=Ind":"Case=Nom|Gender=Com|PronType=Ind",
61
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind":"Gender=Neut|Number=Sing|PronType=Ind",
62
- "Mood=Imp|POS=VERB":"Mood=Imp",
63
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
64
- "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part":"Definite=Ind|Number=Sing|Tense=Past|VerbForm=Part",
65
- "POS=X":"",
 
 
 
 
 
 
 
66
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":"Case=Nom|Gender=Com|Number=Plur|Person=1|PronType=Prs",
 
 
 
 
 
 
 
 
 
 
67
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN":"Case=Gen|Definite=Def|Gender=Com|Number=Plur",
68
- "POS=VERB|Tense=Pres|VerbForm=Part":"Tense=Pres|VerbForm=Part",
69
- "Number=Plur|POS=PRON|PronType=Int,Rel":"Number=Plur|PronType=Int,Rel",
70
- "POS=VERB|VerbForm=Inf|Voice=Pass":"VerbForm=Inf|Voice=Pass",
71
- "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Com|Number=Sing",
72
- "Degree=Cmp|POS=ADV":"Degree=Cmp",
73
- "POS=ADV|PartType=Inf":"PartType=Inf",
74
- "Degree=Sup|POS=ADV":"Degree=Sup",
75
  "Number=Plur|POS=PRON|PronType=Dem":"Number=Plur|PronType=Dem",
76
- "Number=Plur|POS=PRON|PronType=Ind":"Number=Plur|PronType=Ind",
77
- "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":"Definite=Def|Gender=Neut|Number=Plur",
78
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=1|PronType=Prs",
79
- "Case=Gen|POS=PROPN":"Case=Gen",
80
- "POS=ADP":"",
81
  "Degree=Cmp|Number=Plur|POS=ADJ":"Degree=Cmp|Number=Plur",
82
- "Definite=Def|Degree=Sup|POS=ADJ":"Definite=Def|Degree=Sup",
83
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
84
- "Degree=Pos|Number=Sing|POS=ADJ":"Degree=Pos|Number=Sing",
85
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Number=Plur|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
86
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Gender=Com|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
  "Number=Plur|POS=PRON|PronType=Rcp":"Number=Plur|PronType=Rcp",
 
88
  "Case=Gen|Degree=Cmp|POS=ADJ":"Case=Gen|Degree=Cmp",
89
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":"Case=Gen|Definite=Def|Gender=Neut|Number=Plur",
90
- "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs",
91
- "POS=INTJ":"",
92
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Number=Plur|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
93
- "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":"Degree=Pos|Gender=Neut|Number=Sing",
94
- "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Gender=Neut|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
95
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=2|PronType=Prs",
96
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
97
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Neut|Number=Plur",
98
- "Number=Sing|POS=PRON|PronType=Int,Rel":"Number=Sing|PronType=Int,Rel",
99
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Number=Plur|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
100
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel":"Gender=Neut|Number=Sing|PronType=Int,Rel",
101
- "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ":"Definite=Def|Degree=Sup|Number=Plur",
102
- "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=2|PronType=Prs",
103
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
104
- "Definite=Ind|Number=Sing|POS=NOUN":"Definite=Ind|Number=Sing",
105
- "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":"Number=Plur|Tense=Past|VerbForm=Part",
106
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Number=Plur|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
107
- "POS=SYM":"",
108
- "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":"Case=Nom|Gender=Com|Person=2|Polite=Form|PronType=Prs",
109
- "Degree=Sup|POS=ADJ":"Degree=Sup",
110
- "Number=Plur|POS=DET|PronType=Ind|Style=Arch":"Number=Plur|PronType=Ind|Style=Arch",
111
- "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Dem":"Case=Gen|Gender=Com|Number=Sing|PronType=Dem",
112
- "Foreign=Yes|POS=X":"Foreign=Yes",
113
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs":"Person=2|Polite=Form|Poss=Yes|PronType=Prs",
114
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem":"Gender=Neut|Number=Sing|PronType=Dem",
115
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":"Case=Acc|Gender=Com|Number=Plur|Person=1|PronType=Prs",
116
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Neut|Number=Sing",
117
- "Case=Gen|POS=PRON|PronType=Int,Rel":"Case=Gen|PronType=Int,Rel",
118
- "Gender=Com|Number=Sing|POS=PRON|PronType=Dem":"Gender=Com|Number=Sing|PronType=Dem",
119
- "Abbr=Yes|POS=X":"Abbr=Yes",
120
- "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Com|Number=Plur",
121
  "Definite=Def|Degree=Abs|POS=ADJ":"Definite=Def|Degree=Abs",
122
- "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Sup|Number=Sing",
123
- "Definite=Ind|POS=NOUN":"Definite=Ind",
124
- "Gender=Com|Number=Plur|POS=NOUN":"Gender=Com|Number=Plur",
125
- "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs",
126
- "Gender=Com|POS=PRON|PronType=Int,Rel":"Gender=Com|PronType=Int,Rel",
127
- "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":"Case=Nom|Gender=Com|Number=Plur|Person=2|PronType=Prs",
128
  "Degree=Abs|POS=ADV":"Degree=Abs",
129
- "POS=VERB|VerbForm=Ger":"VerbForm=Ger",
130
- "POS=VERB|Tense=Past|VerbForm=Part":"Tense=Past|VerbForm=Part",
131
- "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ":"Definite=Def|Degree=Sup|Number=Sing",
132
- "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Number=Plur|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
133
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":"Case=Gen|Definite=Def|Degree=Pos|Number=Sing",
134
- "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ":"Case=Gen|Degree=Pos|Number=Plur",
135
- "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":"Case=Acc|Gender=Com|Person=2|Polite=Form|PronType=Prs",
136
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel":"Gender=Com|Number=Sing|PronType=Int,Rel",
137
- "POS=VERB|Tense=Pres":"Tense=Pres",
138
- "Case=Gen|Number=Plur|POS=DET|PronType=Ind":"Case=Gen|Number=Plur|PronType=Ind",
139
- "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=2|Poss=Yes|PronType=Prs",
140
- "POS=PRON|Person=2|Polite=Form|Poss=Yes|PronType=Prs":"Person=2|Polite=Form|Poss=Yes|PronType=Prs",
141
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
142
- "POS=AUX|Tense=Pres|VerbForm=Part":"Tense=Pres|VerbForm=Part",
143
- "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Pass",
144
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
145
- "Degree=Sup|Number=Plur|POS=ADJ":"Degree=Sup|Number=Plur",
146
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":"Case=Acc|Gender=Com|Number=Plur|Person=2|PronType=Prs",
147
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
148
- "Definite=Ind|Number=Plur|POS=NOUN":"Definite=Ind|Number=Plur",
149
- "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":"Case=Gen|Number=Plur|Tense=Past|VerbForm=Part",
150
- "Mood=Imp|POS=AUX":"Mood=Imp",
151
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
152
- "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs",
153
- "Definite=Def|Gender=Com|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":"Definite=Def|Gender=Com|Number=Sing|Tense=Past|VerbForm=Part",
154
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Number=Plur|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
 
 
155
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind":"Case=Gen|Gender=Com|Number=Sing|PronType=Ind",
 
 
156
  "Case=Gen|POS=NOUN":"Case=Gen",
157
- "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs",
158
- "POS=DET|PronType=Dem":"PronType=Dem",
159
- "Definite=Def|Number=Plur|POS=NOUN":"Definite=Def|Number=Plur"
160
  },
161
  "labels_pos":{
162
  "AdpType=Prep|POS=ADP":85,
@@ -173,148 +175,150 @@
173
  "Degree=Pos|Number=Plur|POS=ADJ":84,
174
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":92,
175
  "POS=PUNCT":97,
 
176
  "POS=CCONJ":89,
177
- "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ":84,
178
- "Degree=Cmp|POS=ADJ":84,
179
- "POS=PRON|PartType=Inf":95,
180
- "Gender=Com|Number=Sing|POS=DET|PronType=Ind":90,
181
- "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ":84,
182
- "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
183
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":92,
184
- "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":84,
185
- "Gender=Neut|Number=Sing|POS=DET|PronType=Dem":90,
 
186
  "Degree=Pos|POS=ADV":86,
187
- "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":100,
188
- "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":92,
189
- "POS=PRON|PronType=Dem":95,
190
- "NumType=Card|POS=NUM":93,
191
- "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":84,
192
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
193
- "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":84,
194
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
195
- "NumType=Ord|POS=ADJ":84,
196
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
197
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act":87,
198
- "POS=VERB|VerbForm=Inf|Voice=Act":100,
 
199
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act":100,
200
- "POS=NOUN":92,
201
- "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass":100,
202
  "POS=ADP|PartType=Inf":85,
 
 
203
  "Degree=Pos|POS=ADJ":84,
 
 
 
204
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN":92,
205
- "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs":90,
 
 
 
206
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN":92,
 
 
207
  "POS=AUX|VerbForm=Inf|Voice=Act":87,
208
- "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":84,
 
 
 
 
 
 
 
209
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem":90,
210
- "Number=Plur|POS=DET|PronType=Ind":90,
211
- "Gender=Com|Number=Sing|POS=PRON|PronType=Ind":95,
212
- "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes":95,
213
- "POS=PART|PartType=Inf":94,
 
 
 
 
 
 
 
 
 
214
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind":90,
215
- "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs":95,
216
- "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN":92,
217
- "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs":95,
 
 
 
 
 
 
218
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":95,
219
- "Case=Nom|Gender=Com|POS=PRON|PronType=Ind":95,
220
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind":95,
221
- "Mood=Imp|POS=VERB":100,
222
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
223
- "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part":87,
224
- "POS=X":101,
 
 
 
 
 
 
 
225
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":95,
 
 
 
 
 
 
 
 
 
 
226
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN":92,
227
- "POS=VERB|Tense=Pres|VerbForm=Part":100,
228
- "Number=Plur|POS=PRON|PronType=Int,Rel":95,
229
- "POS=VERB|VerbForm=Inf|Voice=Pass":100,
230
- "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":92,
231
- "Degree=Cmp|POS=ADV":86,
232
- "POS=ADV|PartType=Inf":86,
233
- "Degree=Sup|POS=ADV":86,
234
  "Number=Plur|POS=PRON|PronType=Dem":95,
235
- "Number=Plur|POS=PRON|PronType=Ind":95,
236
- "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":92,
237
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":95,
238
- "Case=Gen|POS=PROPN":96,
239
- "POS=ADP":85,
240
  "Degree=Cmp|Number=Plur|POS=ADJ":84,
241
- "Definite=Def|Degree=Sup|POS=ADJ":84,
242
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
243
- "Degree=Pos|Number=Sing|POS=ADJ":84,
244
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
245
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
246
  "Number=Plur|POS=PRON|PronType=Rcp":95,
 
247
  "Case=Gen|Degree=Cmp|POS=ADJ":84,
248
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":92,
249
- "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs":90,
250
- "POS=INTJ":91,
251
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
252
- "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":84,
253
- "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
254
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":95,
255
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
256
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":92,
257
- "Number=Sing|POS=PRON|PronType=Int,Rel":95,
258
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
259
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel":95,
260
- "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ":84,
261
- "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":95,
262
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
263
- "Definite=Ind|Number=Sing|POS=NOUN":92,
264
- "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":100,
265
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":95,
266
- "POS=SYM":99,
267
- "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":95,
268
- "Degree=Sup|POS=ADJ":84,
269
- "Number=Plur|POS=DET|PronType=Ind|Style=Arch":90,
270
- "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Dem":90,
271
- "Foreign=Yes|POS=X":101,
272
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs":90,
273
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem":95,
274
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":95,
275
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":92,
276
- "Case=Gen|POS=PRON|PronType=Int,Rel":95,
277
- "Gender=Com|Number=Sing|POS=PRON|PronType=Dem":95,
278
- "Abbr=Yes|POS=X":101,
279
- "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":92,
280
  "Definite=Def|Degree=Abs|POS=ADJ":84,
281
- "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ":84,
282
- "Definite=Ind|POS=NOUN":92,
283
- "Gender=Com|Number=Plur|POS=NOUN":92,
284
- "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
285
- "Gender=Com|POS=PRON|PronType=Int,Rel":95,
286
- "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":95,
287
  "Degree=Abs|POS=ADV":86,
288
- "POS=VERB|VerbForm=Ger":100,
289
- "POS=VERB|Tense=Past|VerbForm=Part":100,
290
- "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ":84,
291
- "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form":95,
292
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":84,
293
- "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ":84,
294
- "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":95,
295
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel":95,
296
- "POS=VERB|Tense=Pres":100,
297
- "Case=Gen|Number=Plur|POS=DET|PronType=Ind":90,
298
- "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
299
- "POS=PRON|Person=2|Polite=Form|Poss=Yes|PronType=Prs":95,
300
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
301
- "POS=AUX|Tense=Pres|VerbForm=Part":87,
302
- "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass":100,
303
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":95,
304
- "Degree=Sup|Number=Plur|POS=ADJ":84,
305
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":95,
306
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":95,
307
- "Definite=Ind|Number=Plur|POS=NOUN":92,
308
- "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":100,
309
- "Mood=Imp|POS=AUX":87,
310
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs":95,
311
- "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs":95,
312
- "Definite=Def|Gender=Com|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":100,
313
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
 
 
314
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind":90,
 
 
315
  "Case=Gen|POS=NOUN":92,
316
- "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs":95,
317
- "POS=DET|PronType=Dem":90,
318
- "Definite=Def|Number=Plur|POS=NOUN":92
319
- }
320
  }
 
1
  {
2
+ "extend":false,
3
  "labels_morph":{
4
  "AdpType=Prep|POS=ADP":"AdpType=Prep",
5
  "Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":"Definite=Ind|Gender=Com|Number=Sing",
 
15
  "Degree=Pos|Number=Plur|POS=ADJ":"Degree=Pos|Number=Plur",
16
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":"Definite=Ind|Gender=Com|Number=Plur",
17
  "POS=PUNCT":"",
18
+ "NumType=Ord|POS=ADJ":"NumType=Ord",
19
  "POS=CCONJ":"",
 
 
 
 
 
 
20
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":"Definite=Ind|Gender=Neut|Number=Plur",
21
+ "POS=VERB|VerbForm=Inf|Voice=Act":"VerbForm=Inf|Voice=Act",
22
+ "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs",
23
+ "Degree=Sup|POS=ADV":"Degree=Sup",
24
  "Degree=Pos|POS=ADV":"Degree=Pos",
25
+ "Gender=Com|Number=Sing|POS=DET|PronType=Ind":"Gender=Com|Number=Sing|PronType=Ind",
26
+ "Number=Plur|POS=DET|PronType=Ind":"Number=Plur|PronType=Ind",
27
+ "POS=ADP":"",
28
+ "POS=ADV|PartType=Inf":"PartType=Inf",
 
 
 
29
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=3|PronType=Prs",
 
 
30
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Act",
31
+ "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":"Definite=Def|Degree=Pos|Number=Sing",
32
+ "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs",
33
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Act",
 
 
34
  "POS=ADP|PartType=Inf":"PartType=Inf",
35
+ "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Gender=Com|Number=Sing",
36
+ "NumType=Card|POS=NUM":"NumType=Card",
37
  "Degree=Pos|POS=ADJ":"Degree=Pos",
38
+ "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part":"Definite=Ind|Number=Sing|Tense=Past|VerbForm=Part",
39
+ "POS=PART|PartType=Inf":"PartType=Inf",
40
+ "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes":"Case=Acc|Person=3|PronType=Prs|Reflex=Yes",
41
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN":"Definite=Def|Gender=Com|Number=Plur",
42
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":"Definite=Ind|Gender=Neut|Number=Sing",
43
+ "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs",
44
+ "POS=VERB|Tense=Pres|VerbForm=Part":"Tense=Pres|VerbForm=Part",
45
+ "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs":"Case=Nom|Number=Plur|Person=3|PronType=Prs",
46
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN":"Case=Gen|Definite=Def|Gender=Com|Number=Sing",
47
+ "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ":"Definite=Def|Degree=Sup|Number=Plur",
48
+ "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Number=Plur|Person=3|PronType=Prs",
49
  "POS=AUX|VerbForm=Inf|Voice=Act":"VerbForm=Inf|Voice=Act",
50
+ "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing",
51
+ "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Cmp|Number=Sing",
52
+ "Degree=Cmp|POS=ADJ":"Degree=Cmp",
53
+ "POS=PRON|PartType=Inf":"PartType=Inf",
54
+ "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Number=Sing",
55
+ "Case=Nom|Gender=Com|POS=PRON|PronType=Ind":"Case=Nom|Gender=Com|PronType=Ind",
56
+ "Number=Plur|POS=PRON|PronType=Ind":"Number=Plur|PronType=Ind",
57
+ "POS=INTJ":"",
58
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem":"Gender=Com|Number=Sing|PronType=Dem",
59
+ "Case=Gen|Number=Plur|POS=DET|PronType=Ind":"Case=Gen|Number=Plur|PronType=Ind",
60
+ "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass":"Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Pass",
61
+ "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":"Definite=Def|Gender=Neut|Number=Plur",
62
+ "Degree=Cmp|POS=ADV":"Degree=Cmp",
63
+ "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Number=Plur|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
64
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs",
65
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Number=Plur|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
66
+ "Case=Gen|POS=PROPN":"Case=Gen",
67
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind":"Gender=Neut|Number=Sing|PronType=Ind",
68
+ "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":"Number=Plur|Tense=Past|VerbForm=Part",
69
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
70
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=1|PronType=Prs",
71
+ "Definite=Def|Degree=Sup|POS=ADJ":"Definite=Def|Degree=Sup",
72
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind":"Gender=Neut|Number=Sing|PronType=Ind",
73
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Neut|Number=Sing",
74
+ "Gender=Neut|Number=Sing|POS=DET|PronType=Dem":"Gender=Neut|Number=Sing|PronType=Dem",
75
+ "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":"Definite=Def|Number=Sing|Tense=Past|VerbForm=Part",
76
+ "POS=PRON|PronType=Dem":"PronType=Dem",
77
+ "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":"Degree=Pos|Gender=Com|Number=Sing",
78
+ "Number=Plur|POS=NUM":"Number=Plur",
79
+ "POS=VERB|VerbForm=Inf|Voice=Pass":"VerbForm=Inf|Voice=Pass",
80
+ "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ":"Definite=Def|Degree=Sup|Number=Sing",
81
+ "Number=Sing|POS=PRON|PronType=Int,Rel":"Number=Sing|PronType=Int,Rel",
82
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=1|PronType=Prs",
83
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
 
 
84
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
85
+ "POS=PRON":"",
86
+ "Definite=Ind|Number=Sing|POS=NOUN":"Definite=Ind|Number=Sing",
87
+ "Definite=Ind|Number=Sing|POS=NUM":"Definite=Ind|Number=Sing",
88
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Com|Number=Sing",
89
+ "Foreign=Yes|POS=ADV":"Foreign=Yes",
90
+ "POS=NOUN":"",
91
+ "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN":"Case=Gen|Definite=Def|Gender=Neut|Number=Sing",
92
+ "Gender=Com|Number=Plur|POS=NOUN":"Gender=Com|Number=Plur",
93
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel":"Gender=Neut|Number=Sing|PronType=Int,Rel",
94
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":"Case=Nom|Gender=Com|Number=Plur|Person=1|PronType=Prs",
95
+ "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs",
96
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Ind":"Gender=Com|Number=Sing|PronType=Ind",
97
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Com|Number=Plur",
98
+ "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":"Degree=Pos|Gender=Neut|Number=Sing",
99
+ "Degree=Sup|POS=ADJ":"Degree=Sup",
100
+ "Degree=Pos|Number=Sing|POS=ADJ":"Degree=Pos|Number=Sing",
101
+ "Mood=Imp|POS=VERB":"Mood=Imp",
102
+ "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":"Case=Nom|Gender=Com|Person=2|Polite=Form|PronType=Prs",
103
+ "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":"Case=Acc|Gender=Com|Person=2|Polite=Form|PronType=Prs",
104
+ "POS=X":"",
105
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN":"Case=Gen|Definite=Def|Gender=Com|Number=Plur",
 
 
 
 
 
 
 
106
  "Number=Plur|POS=PRON|PronType=Dem":"Number=Plur|PronType=Dem",
107
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":"Case=Acc|Gender=Com|Number=Plur|Person=1|PronType=Prs",
108
+ "Number=Plur|POS=PRON|PronType=Int,Rel":"Number=Plur|PronType=Int,Rel",
109
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
 
 
110
  "Degree=Cmp|Number=Plur|POS=ADJ":"Degree=Cmp|Number=Plur",
111
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Number=Plur|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
 
 
 
112
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Gender=Com|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
113
+ "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=2|PronType=Prs",
114
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=2|PronType=Prs",
115
+ "Gender=Com|POS=PRON|PronType=Int,Rel":"Gender=Com|PronType=Int,Rel",
116
+ "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ":"Case=Gen|Degree=Pos|Number=Plur",
117
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
118
+ "POS=VERB|VerbForm=Ger":"VerbForm=Ger",
119
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Dem":"Gender=Com|Number=Sing|PronType=Dem",
120
+ "Case=Gen|POS=PRON|PronType=Int,Rel":"Case=Gen|PronType=Int,Rel",
121
+ "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Pass",
122
+ "Abbr=Yes|POS=X":"Abbr=Yes",
123
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Neut|Number=Plur",
124
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
125
+ "Definite=Ind|Number=Plur|POS=NOUN":"Definite=Ind|Number=Plur",
126
+ "Foreign=Yes|POS=X":"Foreign=Yes",
127
  "Number=Plur|POS=PRON|PronType=Rcp":"Number=Plur|PronType=Rcp",
128
+ "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":"Case=Nom|Gender=Com|Number=Plur|Person=2|PronType=Prs",
129
  "Case=Gen|Degree=Cmp|POS=ADJ":"Case=Gen|Degree=Cmp",
130
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":"Case=Gen|Definite=Def|Gender=Neut|Number=Plur",
131
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":"Case=Acc|Gender=Com|Number=Plur|Person=2|PronType=Prs",
132
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem":"Gender=Neut|Number=Sing|PronType=Dem",
 
 
 
 
 
 
 
133
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Number=Plur|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
134
+ "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Gender=Neut|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
 
 
 
 
 
135
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Number=Plur|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
136
+ "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs",
137
+ "Case=Gen|Number=Plur|POS=PRON|PronType=Rcp":"Case=Gen|Number=Plur|PronType=Rcp",
 
 
 
 
138
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs":"Person=2|Polite=Form|Poss=Yes|PronType=Prs",
139
+ "POS=SYM":"",
140
+ "POS=DET|PronType=Dem":"PronType=Dem",
141
+ "Gender=Com|Number=Sing|POS=NUM":"Gender=Com|Number=Sing",
142
+ "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=2|Poss=Yes|PronType=Prs",
143
+ "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":"Case=Gen|Number=Plur|Tense=Past|VerbForm=Part",
 
 
144
  "Definite=Def|Degree=Abs|POS=ADJ":"Definite=Def|Degree=Abs",
145
+ "POS=VERB|Tense=Pres":"Tense=Pres",
146
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NUM":"Definite=Ind|Gender=Neut|Number=Sing",
 
 
 
 
147
  "Degree=Abs|POS=ADV":"Degree=Abs",
 
 
 
 
148
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":"Case=Gen|Definite=Def|Degree=Pos|Number=Sing",
 
 
149
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel":"Gender=Com|Number=Sing|PronType=Int,Rel",
150
+ "POS=VERB|Tense=Past|VerbForm=Part":"Tense=Past|VerbForm=Part",
151
+ "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Sup|Number=Sing",
 
 
152
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
 
 
 
 
 
 
 
 
 
153
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
 
 
154
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Number=Plur|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
155
+ "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs",
156
+ "Definite=Ind|POS=NOUN":"Definite=Ind",
157
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind":"Case=Gen|Gender=Com|Number=Sing|PronType=Ind",
158
+ "Definite=Ind|Gender=Com|Number=Sing|POS=NUM":"Definite=Ind|Gender=Com|Number=Sing",
159
+ "Definite=Def|Number=Plur|POS=NOUN":"Definite=Def|Number=Plur",
160
  "Case=Gen|POS=NOUN":"Case=Gen",
161
+ "POS=AUX|Tense=Pres|VerbForm=Part":"Tense=Pres|VerbForm=Part"
 
 
162
  },
163
  "labels_pos":{
164
  "AdpType=Prep|POS=ADP":85,
 
175
  "Degree=Pos|Number=Plur|POS=ADJ":84,
176
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":92,
177
  "POS=PUNCT":97,
178
+ "NumType=Ord|POS=ADJ":84,
179
  "POS=CCONJ":89,
 
 
 
 
 
 
180
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":92,
181
+ "POS=VERB|VerbForm=Inf|Voice=Act":100,
182
+ "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
183
+ "Degree=Sup|POS=ADV":86,
184
  "Degree=Pos|POS=ADV":86,
185
+ "Gender=Com|Number=Sing|POS=DET|PronType=Ind":90,
186
+ "Number=Plur|POS=DET|PronType=Ind":90,
187
+ "POS=ADP":85,
188
+ "POS=ADV|PartType=Inf":86,
 
 
 
189
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
 
 
190
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act":87,
191
+ "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":84,
192
+ "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs":90,
193
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act":100,
 
 
194
  "POS=ADP|PartType=Inf":85,
195
+ "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":84,
196
+ "NumType=Card|POS=NUM":93,
197
  "Degree=Pos|POS=ADJ":84,
198
+ "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part":87,
199
+ "POS=PART|PartType=Inf":94,
200
+ "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes":95,
201
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN":92,
202
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":92,
203
+ "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs":90,
204
+ "POS=VERB|Tense=Pres|VerbForm=Part":100,
205
+ "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs":95,
206
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN":92,
207
+ "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ":84,
208
+ "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs":95,
209
  "POS=AUX|VerbForm=Inf|Voice=Act":87,
210
+ "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":84,
211
+ "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ":84,
212
+ "Degree=Cmp|POS=ADJ":84,
213
+ "POS=PRON|PartType=Inf":95,
214
+ "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ":84,
215
+ "Case=Nom|Gender=Com|POS=PRON|PronType=Ind":95,
216
+ "Number=Plur|POS=PRON|PronType=Ind":95,
217
+ "POS=INTJ":91,
218
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem":90,
219
+ "Case=Gen|Number=Plur|POS=DET|PronType=Ind":90,
220
+ "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass":100,
221
+ "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":92,
222
+ "Degree=Cmp|POS=ADV":86,
223
+ "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form":95,
224
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
225
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
226
+ "Case=Gen|POS=PROPN":96,
227
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind":95,
228
+ "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":100,
229
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
230
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":95,
231
+ "Definite=Def|Degree=Sup|POS=ADJ":84,
232
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind":90,
233
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":92,
234
+ "Gender=Neut|Number=Sing|POS=DET|PronType=Dem":90,
235
+ "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":100,
236
+ "POS=PRON|PronType=Dem":95,
237
+ "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":84,
238
+ "Number=Plur|POS=NUM":93,
239
+ "POS=VERB|VerbForm=Inf|Voice=Pass":100,
240
+ "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ":84,
241
+ "Number=Sing|POS=PRON|PronType=Int,Rel":95,
242
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":95,
243
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
 
 
244
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
245
+ "POS=PRON":95,
246
+ "Definite=Ind|Number=Sing|POS=NOUN":92,
247
+ "Definite=Ind|Number=Sing|POS=NUM":93,
248
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":92,
249
+ "Foreign=Yes|POS=ADV":86,
250
+ "POS=NOUN":92,
251
+ "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN":92,
252
+ "Gender=Com|Number=Plur|POS=NOUN":92,
253
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel":95,
254
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":95,
255
+ "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
256
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Ind":95,
257
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":92,
258
+ "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":84,
259
+ "Degree=Sup|POS=ADJ":84,
260
+ "Degree=Pos|Number=Sing|POS=ADJ":84,
261
+ "Mood=Imp|POS=VERB":100,
262
+ "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":95,
263
+ "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":95,
264
+ "POS=X":101,
265
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN":92,
 
 
 
 
 
 
 
266
  "Number=Plur|POS=PRON|PronType=Dem":95,
267
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":95,
268
+ "Number=Plur|POS=PRON|PronType=Int,Rel":95,
269
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
 
 
270
  "Degree=Cmp|Number=Plur|POS=ADJ":84,
271
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
 
 
 
272
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
273
+ "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":95,
274
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":95,
275
+ "Gender=Com|POS=PRON|PronType=Int,Rel":95,
276
+ "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ":84,
277
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":95,
278
+ "POS=VERB|VerbForm=Ger":100,
279
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Dem":95,
280
+ "Case=Gen|POS=PRON|PronType=Int,Rel":95,
281
+ "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass":100,
282
+ "Abbr=Yes|POS=X":101,
283
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":92,
284
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
285
+ "Definite=Ind|Number=Plur|POS=NOUN":92,
286
+ "Foreign=Yes|POS=X":101,
287
  "Number=Plur|POS=PRON|PronType=Rcp":95,
288
+ "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":95,
289
  "Case=Gen|Degree=Cmp|POS=ADJ":84,
290
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":92,
291
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":95,
292
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem":95,
 
 
 
 
 
 
 
293
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
294
+ "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
 
 
 
 
 
295
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":95,
296
+ "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs":95,
297
+ "Case=Gen|Number=Plur|POS=PRON|PronType=Rcp":95,
 
 
 
 
298
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs":90,
299
+ "POS=SYM":99,
300
+ "POS=DET|PronType=Dem":90,
301
+ "Gender=Com|Number=Sing|POS=NUM":93,
302
+ "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
303
+ "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":100,
 
 
304
  "Definite=Def|Degree=Abs|POS=ADJ":84,
305
+ "POS=VERB|Tense=Pres":100,
306
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NUM":93,
 
 
 
 
307
  "Degree=Abs|POS=ADV":86,
 
 
 
 
308
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":84,
 
 
309
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel":95,
310
+ "POS=VERB|Tense=Past|VerbForm=Part":100,
311
+ "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ":84,
 
 
312
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
 
 
 
 
 
 
 
 
 
313
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs":95,
 
 
314
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
315
+ "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs":95,
316
+ "Definite=Ind|POS=NOUN":92,
317
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind":90,
318
+ "Definite=Ind|Gender=Com|Number=Sing|POS=NUM":93,
319
+ "Definite=Def|Number=Plur|POS=NOUN":92,
320
  "Case=Gen|POS=NOUN":92,
321
+ "POS=AUX|Tense=Pres|VerbForm=Part":87
322
+ },
323
+ "overwrite":true
 
324
  }
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:23f57fc1675443668f91cec6eebb387f059ca61d03964b3546612a8127a44bf6
3
- size 644296
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb86086f9d66038f2aa26709d98c16c55019bee46b71a06aed3b849e91b32ac7
3
+ size 648448
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a37d9bbf3eda077a5e3e488554779e4cf6e852a62f2a0b8bc8b70c8a638d8dc0
3
  size 291498
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8ae79ba510d4fd1ea694860d0e3988e802d2117641e58945ecd22dd0316a6dd
3
  size 291498
ner/moves CHANGED
@@ -1 +1 @@
1
- ��moves��{"0":{},"1":{"PER":2146,"MISC":1273,"ORG":1267,"LOC":1144},"2":{"PER":2146,"MISC":1273,"ORG":1267,"LOC":1144},"3":{"PER":2146,"MISC":1273,"ORG":1267,"LOC":1144},"4":{"PER":2146,"MISC":1273,"ORG":1267,"LOC":1144,"":1},"5":{"":1}}�cfg��neg_key�
 
1
+ ��moves��{"0":{},"1":{"PER":1361,"ORG":943,"MISC":826,"LOC":768},"2":{"PER":1361,"ORG":943,"MISC":826,"LOC":768},"3":{"PER":1361,"ORG":943,"MISC":826,"LOC":768},"4":{"PER":1361,"ORG":943,"MISC":826,"LOC":768,"":1},"5":{"":1}}�cfg��neg_key�
parser/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e771118fbecf3915fd2e7788de4d712a3aacece2e084415887537fce51a3947
3
- size 521693
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b730ca95666b48456598fd9d9cb8b32b541a62b966389ba845fa7404f46002a
3
+ size 1219859
parser/moves CHANGED
@@ -1 +1 @@
1
- ��moves�2{"0":{"":41514},"1":{"":34292},"2":{"case":7489,"nsubj":6009,"det":4334,"amod":3968,"advmod":3657,"mark":3529,"aux":2432,"cc":2261,"punct":2182,"cop":1329,"obl":894,"nummod":799,"nmod:poss":651,"nmod":460,"expl":291,"ccomp":202,"obj":195,"xcomp":122,"case||nmod":73,"obl:tmod":53,"dep":49,"acl:relcl":43},"3":{"punct":8600,"obl":3949,"obj":3758,"nmod":3565,"conj":2743,"advmod":2095,"flat":1294,"nsubj":1172,"acl:relcl":1131,"advcl":808,"amod":629,"obl:loc":467,"fixed":390,"dep":322,"xcomp":272,"appos":268,"compound:prt":261,"ccomp":252,"acl:relcl||nsubj":237,"case":202,"nummod":167,"list":161,"nmod:poss":156,"punct||conj":151,"mark":137,"cc":135,"iobj":107,"expl":77,"cop":69,"nmod||case":60,"aux":48,"obl:tmod":45,"cc||case":43,"advcl||advmod":43,"cc||conj":40,"case||obl":38,"punct||case":33},"4":{"ROOT":4367}}�cfg��neg_key�
 
1
+ ��moves��{"0":{"":30710},"1":{"":22084},"2":{"case":5238,"nsubj":4163,"punct":3257,"det":3028,"amod":2815,"advmod":2482,"mark":2317,"aux":1748,"cc":1610,"cop":823,"obl":627,"nummod":620,"nmod:poss":457,"nmod":384,"expl":193,"obj":188,"ccomp":155,"advcl":110,"xcomp":81,"case||nmod":45,"dep":32,"obl:tmod":31},"3":{"punct":4355,"obl":2759,"obj":2659,"nmod":2503,"conj":1923,"advmod":1246,"flat":886,"nsubj":805,"acl:relcl":800,"advcl":744,"amod":415,"xcomp":307,"advmod:lmod":273,"fixed":267,"dep":218,"compound:prt":211,"appos":187,"ccomp":177,"acl:relcl||nsubj":144,"case":130,"nmod:poss":112,"mark":103,"iobj":99,"nummod":93,"list":86,"cc":72,"expl":55,"cop":40,"obl:lmod":35,"obl:tmod":34,"cc||case":31},"4":{"ROOT":2970}}�cfg��neg_key�
span_resolver/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "nI":1024
3
+ }
span_resolver/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9bfe5b3362ecdf657add5ec93ccaa0d26dacc396d2251816bb6717e67e744a3f
3
+ size 9810117
tagger/cfg ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ "ADJ",
4
+ "ADP",
5
+ "ADV",
6
+ "AUX",
7
+ "CCONJ",
8
+ "DET",
9
+ "INTJ",
10
+ "NOUN",
11
+ "NUM",
12
+ "PART",
13
+ "PRON",
14
+ "PROPN",
15
+ "PUNCT",
16
+ "SCONJ",
17
+ "SYM",
18
+ "VERB",
19
+ "X"
20
+ ],
21
+ "neg_prefix":"!",
22
+ "overwrite":false
23
+ }
tagger/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2eb750a8d2d34ce50a457f972731c0dda7229e46a98cbbe87ec46acfe089ca4c
3
+ size 70342
tokenizer CHANGED
The diff for this file is too large to render. See raw diff
 
trainable_lemmatizer/cfg ADDED
@@ -0,0 +1,348 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ 1,
4
+ 2,
5
+ 4,
6
+ 6,
7
+ 8,
8
+ 10,
9
+ 12,
10
+ 14,
11
+ 16,
12
+ 18,
13
+ 20,
14
+ 24,
15
+ 26,
16
+ 29,
17
+ 30,
18
+ 34,
19
+ 36,
20
+ 38,
21
+ 42,
22
+ 44,
23
+ 46,
24
+ 48,
25
+ 50,
26
+ 52,
27
+ 54,
28
+ 56,
29
+ 57,
30
+ 60,
31
+ 63,
32
+ 65,
33
+ 67,
34
+ 69,
35
+ 71,
36
+ 73,
37
+ 75,
38
+ 76,
39
+ 78,
40
+ 81,
41
+ 83,
42
+ 84,
43
+ 86,
44
+ 88,
45
+ 92,
46
+ 96,
47
+ 98,
48
+ 100,
49
+ 103,
50
+ 106,
51
+ 108,
52
+ 110,
53
+ 113,
54
+ 115,
55
+ 117,
56
+ 119,
57
+ 121,
58
+ 124,
59
+ 125,
60
+ 127,
61
+ 129,
62
+ 131,
63
+ 133,
64
+ 134,
65
+ 138,
66
+ 140,
67
+ 142,
68
+ 144,
69
+ 146,
70
+ 148,
71
+ 151,
72
+ 153,
73
+ 155,
74
+ 156,
75
+ 159,
76
+ 160,
77
+ 162,
78
+ 164,
79
+ 166,
80
+ 167,
81
+ 168,
82
+ 170,
83
+ 172,
84
+ 175,
85
+ 177,
86
+ 180,
87
+ 182,
88
+ 185,
89
+ 188,
90
+ 190,
91
+ 191,
92
+ 194,
93
+ 197,
94
+ 199,
95
+ 201,
96
+ 205,
97
+ 208,
98
+ 211,
99
+ 212,
100
+ 213,
101
+ 215,
102
+ 217,
103
+ 219,
104
+ 220,
105
+ 221,
106
+ 223,
107
+ 224,
108
+ 226,
109
+ 229,
110
+ 231,
111
+ 232,
112
+ 233,
113
+ 236,
114
+ 238,
115
+ 240,
116
+ 242,
117
+ 244,
118
+ 246,
119
+ 249,
120
+ 250,
121
+ 252,
122
+ 255,
123
+ 256,
124
+ 257,
125
+ 228,
126
+ 259,
127
+ 262,
128
+ 264,
129
+ 266,
130
+ 269,
131
+ 271,
132
+ 274,
133
+ 276,
134
+ 279,
135
+ 281,
136
+ 283,
137
+ 284,
138
+ 285,
139
+ 286,
140
+ 288,
141
+ 289,
142
+ 290,
143
+ 291,
144
+ 293,
145
+ 294,
146
+ 297,
147
+ 298,
148
+ 300,
149
+ 302,
150
+ 303,
151
+ 305,
152
+ 307,
153
+ 308,
154
+ 309,
155
+ 311,
156
+ 312,
157
+ 314,
158
+ 316,
159
+ 318,
160
+ 321,
161
+ 322,
162
+ 323,
163
+ 324,
164
+ 325,
165
+ 327,
166
+ 329,
167
+ 331,
168
+ 333,
169
+ 334,
170
+ 336,
171
+ 338,
172
+ 340,
173
+ 341,
174
+ 343,
175
+ 345,
176
+ 348,
177
+ 351,
178
+ 353,
179
+ 355,
180
+ 356,
181
+ 357,
182
+ 360,
183
+ 362,
184
+ 366,
185
+ 368,
186
+ 370,
187
+ 372,
188
+ 374,
189
+ 376,
190
+ 377,
191
+ 379,
192
+ 381,
193
+ 382,
194
+ 383,
195
+ 384,
196
+ 386,
197
+ 388,
198
+ 389,
199
+ 391,
200
+ 392,
201
+ 395,
202
+ 396,
203
+ 398,
204
+ 400,
205
+ 401,
206
+ 402,
207
+ 403,
208
+ 405,
209
+ 407,
210
+ 408,
211
+ 409,
212
+ 410,
213
+ 412,
214
+ 414,
215
+ 415,
216
+ 418,
217
+ 419,
218
+ 421,
219
+ 422,
220
+ 425,
221
+ 427,
222
+ 428,
223
+ 430,
224
+ 432,
225
+ 433,
226
+ 435,
227
+ 436,
228
+ 438,
229
+ 440,
230
+ 442,
231
+ 443,
232
+ 444,
233
+ 445,
234
+ 446,
235
+ 448,
236
+ 451,
237
+ 452,
238
+ 453,
239
+ 455,
240
+ 456,
241
+ 458,
242
+ 461,
243
+ 463,
244
+ 464,
245
+ 466,
246
+ 467,
247
+ 468,
248
+ 469,
249
+ 470,
250
+ 473,
251
+ 475,
252
+ 479,
253
+ 481,
254
+ 482,
255
+ 486,
256
+ 488,
257
+ 490,
258
+ 492,
259
+ 493,
260
+ 495,
261
+ 497,
262
+ 502,
263
+ 503,
264
+ 504,
265
+ 505,
266
+ 508,
267
+ 509,
268
+ 510,
269
+ 511,
270
+ 512,
271
+ 513,
272
+ 514,
273
+ 515,
274
+ 516,
275
+ 518,
276
+ 519,
277
+ 521,
278
+ 522,
279
+ 524,
280
+ 528,
281
+ 529,
282
+ 530,
283
+ 534,
284
+ 535,
285
+ 537,
286
+ 539,
287
+ 542,
288
+ 544,
289
+ 547,
290
+ 549,
291
+ 550,
292
+ 551,
293
+ 553,
294
+ 555,
295
+ 556,
296
+ 557,
297
+ 559,
298
+ 560,
299
+ 562,
300
+ 566,
301
+ 567,
302
+ 568,
303
+ 570,
304
+ 573,
305
+ 576,
306
+ 579,
307
+ 581,
308
+ 583,
309
+ 584,
310
+ 585,
311
+ 587,
312
+ 590,
313
+ 592,
314
+ 595,
315
+ 597,
316
+ 598,
317
+ 599,
318
+ 600,
319
+ 604,
320
+ 606,
321
+ 608,
322
+ 609,
323
+ 611,
324
+ 612,
325
+ 613,
326
+ 614,
327
+ 616,
328
+ 618,
329
+ 619,
330
+ 622,
331
+ 623,
332
+ 624,
333
+ 625,
334
+ 626,
335
+ 627,
336
+ 628,
337
+ 632,
338
+ 635,
339
+ 636,
340
+ 637,
341
+ 639,
342
+ 640,
343
+ 641,
344
+ 642,
345
+ 643,
346
+ 644
347
+ ]
348
+ }
trainable_lemmatizer/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d61d949fe113253b9ba1fdcc196ebb83925fc85232068a0a48a84a46a7f7ae75
3
+ size 1411053
trainable_lemmatizer/trees ADDED
Binary file (68.7 kB). View file
 
transformer/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e876008aaa91bb3bdc8e5902a85993adcfc8036e7b96307e4d70159cc590f2f
3
+ size 1422087053
transformer/model/config.json DELETED
@@ -1,27 +0,0 @@
1
- {
2
- "_name_or_path": "xlm-roberta-large",
3
- "architectures": [
4
- "XLMRobertaForMaskedLM"
5
- ],
6
- "attention_probs_dropout_prob": 0.1,
7
- "bos_token_id": 0,
8
- "eos_token_id": 2,
9
- "gradient_checkpointing": false,
10
- "hidden_act": "gelu",
11
- "hidden_dropout_prob": 0.1,
12
- "hidden_size": 1024,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 4096,
15
- "layer_norm_eps": 1e-05,
16
- "max_position_embeddings": 514,
17
- "model_type": "xlm-roberta",
18
- "num_attention_heads": 16,
19
- "num_hidden_layers": 24,
20
- "output_past": true,
21
- "pad_token_id": 1,
22
- "position_embedding_type": "absolute",
23
- "transformers_version": "4.5.1",
24
- "type_vocab_size": 1,
25
- "use_cache": true,
26
- "vocab_size": 250002
27
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
transformer/model/special_tokens_map.json DELETED
@@ -1 +0,0 @@
1
- {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": false}}
 
 
transformer/model/tokenizer_config.json DELETED
@@ -1 +0,0 @@
1
- {"bos_token": "<s>", "eos_token": "</s>", "sep_token": "</s>", "cls_token": "<s>", "unk_token": "<unk>", "pad_token": "<pad>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": true, "__type": "AddedToken"}, "strip_accents": false, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "xlm-roberta-large"}
 
 
vocab/lookups.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a6f4a94131759bf84baec98b3347bcef57ffb2d6712f7f3b8f611e9ef4b3df35
3
- size 20402
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76be8b528d0075f7aae98d6fa57a6d3c83ae480a8469e668d7b0af968995ac71
3
+ size 1
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5b50a86603f748496e4fd87a8aaa203a32bf82d4b3768bf54187ff40de3ca6f9
3
- size 460120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec97f460233cd7e225ffce56c87cfe570210c27ccdb209855f13ebba48c7a1bf
3
+ size 544073
vocab/vectors.cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "mode":"default"
3
+ }