KennethEnevoldsen commited on
Commit
aed2231
1 Parent(s): 61a54ab

Added version 0.2.0

Browse files
.gitattributes CHANGED
@@ -20,3 +20,4 @@
20
  *strings.json filter=lfs diff=lfs merge=lfs -text
21
  vectors filter=lfs diff=lfs merge=lfs -text
22
  model filter=lfs diff=lfs merge=lfs -text
 
 
20
  *strings.json filter=lfs diff=lfs merge=lfs -text
21
  vectors filter=lfs diff=lfs merge=lfs -text
22
  model filter=lfs diff=lfs merge=lfs -text
23
+ entity_linker/kb/* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,87 +1,37 @@
1
- ---
2
- tags:
3
- - spacy
4
- - token-classification
5
- language:
6
- - da
7
- license: apache-2.0
8
- model-index:
9
- - name: da_dacy_medium_trf
10
- results:
11
- - task:
12
- name: NER
13
- type: token-classification
14
- metrics:
15
- - name: NER Precision
16
- type: precision
17
- value: 0.817047817
18
- - name: NER Recall
19
- type: recall
20
- value: 0.81875
21
- - name: NER F Score
22
- type: f_score
23
- value: 0.8178980229
24
- - task:
25
- name: SENTER
26
- type: token-classification
27
- metrics:
28
- - name: SENTER Precision
29
- type: precision
30
- value: 0.873015873
31
- - name: SENTER Recall
32
- type: recall
33
- value: 0.8776595745
34
- - name: SENTER F Score
35
- type: f_score
36
- value: 0.875331565
37
- - task:
38
- name: UNLABELED_DEPENDENCIES
39
- type: token-classification
40
- metrics:
41
- - name: Unlabeled Dependencies Accuracy
42
- type: accuracy
43
- value: 0.8714971531
44
- - task:
45
- name: LABELED_DEPENDENCIES
46
- type: token-classification
47
- metrics:
48
- - name: Labeled Dependencies Accuracy
49
- type: accuracy
50
- value: 0.8714971531
51
- ---
52
 
53
  <a href="https://github.com/centre-for-humanities-computing/Dacy"><img src="https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png" width="175" height="175" align="right" /></a>
54
 
55
- # DaCy medium transformer
56
 
57
  DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.
58
- DaCy's largest pipeline has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency
59
- parsing for Danish on the DaNE dataset. Check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results.
60
  DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.
61
-
62
 
63
  | Feature | Description |
64
  | --- | --- |
65
  | **Name** | `da_dacy_medium_trf` |
66
- | **Version** | `0.1.0` |
67
- | **spaCy** | `>=3.1.1,<3.2.0` |
68
- | **Default Pipeline** | `transformer`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
69
- | **Components** | `transformer`, `morphologizer`, `parser`, `attribute_ruler`, `lemmatizer`, `ner` |
70
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
71
- | **Sources** | [UD Danish DDT v2.5](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[Maltehb/danish-bert-botxo](https://huggingface.co/Maltehb/danish-bert-botxo) (BotXO.ai) |
72
  | **License** | `Apache-2.0 License` |
73
- | **Author** | [Centre for Humanities Computing Aarhus](https://chcaa.io/#/) |
74
 
75
  ### Label Scheme
76
 
77
  <details>
78
 
79
- <summary>View label scheme (192 labels for 3 components)</summary>
80
 
81
  | Component | Labels |
82
  | --- | --- |
83
- | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `POS=CCONJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Degree=Pos\|POS=ADV`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=PRON\|PronType=Dem`, `NumType=Card\|POS=NUM`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `NumType=Ord\|POS=ADJ`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `POS=ADP\|PartType=Inf`, `Degree=Pos\|POS=ADJ`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Number=Plur\|POS=DET\|PronType=Ind`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `POS=PART\|PartType=Inf`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Mood=Imp\|POS=VERB`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=X`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `POS=ADV\|PartType=Inf`, `Degree=Sup\|POS=ADV`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Number=Plur\|POS=PRON\|PronType=Ind`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|POS=PROPN`, `POS=ADP`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=INTJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=SYM`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Degree=Sup\|POS=ADJ`, `Number=Plur\|POS=DET\|PronType=Ind\|Style=Arch`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Foreign=Yes\|POS=X`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Definite=Ind\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Degree=Abs\|POS=ADV`, `POS=VERB\|VerbForm=Ger`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Pres`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=PRON\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `POS=AUX\|Tense=Pres\|VerbForm=Part`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Mood=Imp\|POS=AUX`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|POS=NOUN`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=DET\|PronType=Dem`, `Definite=Def\|Number=Plur\|POS=NOUN` |
84
- | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:loc`, `obl:tmod`, `punct`, `xcomp` |
 
85
  | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
86
 
87
  </details>
@@ -90,104 +40,36 @@ DaCy also contains guides on usage of the package as well as behavioural test fo
90
 
91
  | Type | Score |
92
  | --- | --- |
93
- | `POS_ACC` | 97.44 |
94
- | `MORPH_ACC` | 97.24 |
95
- | `DEP_UAS` | 87.15 |
96
- | `DEP_LAS` | 83.97 |
97
- | `SENTS_P` | 87.30 |
98
- | `SENTS_R` | 87.77 |
99
- | `SENTS_F` | 87.53 |
100
- | `LEMMA_ACC` | 84.91 |
101
- | `ENTS_F` | 81.79 |
102
- | `ENTS_P` | 81.70 |
103
- | `ENTS_R` | 81.88 |
104
- | `TRANSFORMER_LOSS` | 1224302.39 |
105
- | `MORPHOLOGIZER_LOSS` | 388869.90 |
106
- | `PARSER_LOSS` | 7861802.70 |
107
- | `NER_LOSS` | 68503.20 |
108
-
109
-
110
- ## Bias and Robustness
111
-
112
- Besides the validation done by SpaCy on the DaNE testset, DaCy also provides a series of augmentations to the DaNE test set to see how well the models deal with these types of augmentations.
113
- The can be seen as behavioural probes akinn to the NLP checklist.
114
-
115
- ### Deterministic Augmentations
116
- Deterministic augmentations are augmentation which always yield the same result.
117
-
118
- | Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) | Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |
119
- | --- | --- | --- | --- | --- | --- | --- | --- |
120
- | No augmentation | 0.98 | 0.975 | 0.888 | 0.857 | 0.936 | 0.844 | 0.765 |
121
- | Æøå Augmentation | 0.963 | 0.955 | 0.88 | 0.844 | 0.944 | 0.754 | 0.712 |
122
- | Lowercase | 0.98 | 0.975 | 0.888 | 0.857 | 0.936 | 0.848 | 0.765 |
123
- | No Spacing | 0.229 | 0.229 | 0.004 | 0.004 | 0.683 | 0.225 | 0.058 |
124
- | Abbreviated first names | 0.976 | 0.974 | 0.885 | 0.854 | 0.934 | 0.845 | 0.741 |
125
- | Input size augmentation 5 sentences | 0.978 | 0.973 | 0.88 | 0.85 | 0.883 | 0.844 | 0.77 |
126
- | Input size augmentation 10 sentences | 0.977 | 0.973 | 0.878 | 0.847 | 0.872 | 0.844 | 0.768 |
127
-
128
-
129
-
130
- ### Stochastic Augmentations
131
- Stochastic augmentations are augmentation which are repeated mulitple times to estimate the effect of the augmentation.
132
-
133
- | Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) | Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |
134
- | --- | --- | --- | --- | --- | --- | --- | --- |
135
- | Keystroke errors 2% | 0.936 (0.002) | 0.934 (0.002) | 0.836 (0.002) | 0.795 (0.002) | 0.889 (0.002) | 0.773 (0.002) | 0.627 (0.002) |
136
- | Keystroke errors 5% | 0.869 (0.003) | 0.873 (0.003) | 0.753 (0.003) | 0.696 (0.003) | 0.829 (0.003) | 0.68 (0.003) | 0.487 (0.003) |
137
- | Keystroke errors 15% | 0.647 (0.007) | 0.684 (0.007) | 0.5 (0.007) | 0.417 (0.007) | 0.664 (0.007) | 0.46 (0.007) | 0.256 (0.007) |
138
- | Danish names | 0.978 (0.0) | 0.975 (0.0) | 0.885 (0.0) | 0.855 (0.0) | 0.934 (0.0) | 0.847 (0.0) | 0.771 (0.0) |
139
- | Muslim names | 0.978 (0.0) | 0.975 (0.0) | 0.886 (0.0) | 0.855 (0.0) | 0.935 (0.0) | 0.847 (0.0) | 0.749 (0.0) |
140
- | Female names | 0.979 (0.0) | 0.975 (0.0) | 0.886 (0.0) | 0.856 (0.0) | 0.933 (0.0) | 0.847 (0.0) | 0.775 (0.0) |
141
- | Male names | 0.978 (0.0) | 0.975 (0.0) | 0.885 (0.0) | 0.855 (0.0) | 0.933 (0.0) | 0.847 (0.0) | 0.773 (0.0) |
142
- | Spacing Augmention 5% | 0.941 (0.002) | 0.937 (0.002) | 0.78 (0.002) | 0.751 (0.002) | 0.905 (0.002) | 0.812 (0.002) | 0.701 (0.002) |
143
-
144
- <details>
145
-
146
- <summary> Description of Augmenters </summary>
147
-
148
-
149
-
150
- **No augmentation:**
151
- Applies no augmentation to the DaNE test set.
152
-
153
- **Æøå Augmentation:**
154
- This augmentation replace the æ,ø, and å with their spelling variations ae, oe and aa respectively.
155
-
156
- **Lowercase:**
157
- This augmentation lowercases all text.
158
-
159
- **No Spacing:**
160
- This augmentation removed all spacing from the text.
161
-
162
- **Abbreviated first names:**
163
- This agmentation abbreviates the first names of entities. For instance 'Kenneth Enevoldsen' would turn to 'K. Enevoldsen'.
164
-
165
- **Keystroke errors 2%:**
166
- This agmentation simulate keystroke errors by replacing 2% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
167
-
168
- **Keystroke errors 5%:**
169
- This agmentation simulate keystroke errors by replacing 5% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
170
-
171
- **Keystroke errors 15%:**
172
- This agmentation simulate keystroke errors by replacing 15% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
173
-
174
- **Danish names:**
175
- This agmentation replace all names with Danish names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
176
-
177
- **Muslim names:**
178
- This agmentation replace all names with Muslim names derived from Meldgaard (2005). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
179
-
180
- **Female names:**
181
- This agmentation replace all names with Danish female names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
182
-
183
- **Male names:**
184
- This agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
185
-
186
- **Spacing Augmention 5%:**
187
- This agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.
188
- </details>
189
- <br />
190
-
191
-
192
- ### Hardware
193
- This was run an trained on a Quadro RTX 8000 GPU.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
 
2
  <a href="https://github.com/centre-for-humanities-computing/Dacy"><img src="https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png" width="175" height="175" align="right" /></a>
3
 
4
+ # DaCy medium
5
 
6
  DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.
7
+ DaCy's largest pipeline has achieved State-of-the-Art performance on parts-of-speech tagging and dependency
8
+ parsing for Danish on the DaNE dataset. To read more check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results.
9
  DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.
10
+
11
 
12
  | Feature | Description |
13
  | --- | --- |
14
  | **Name** | `da_dacy_medium_trf` |
15
+ | **Version** | `0.2.0` |
16
+ | **spaCy** | `>=3.5.2,<3.6.0` |
17
+ | **Default Pipeline** | `transformer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `ner`, `coref`, `span_resolver`, `span_cleaner`, `entity_linker` |
18
+ | **Components** | `transformer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `ner`, `coref`, `span_resolver`, `span_cleaner`, `entity_linker` |
19
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
20
+ | **Sources** | [UD Danish DDT v2.11](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://huggingface.co/datasets/dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[DaCoref](https://huggingface.co/datasets/alexandrainst/dacoref) (Buch-Kromann, Matthias)<br />[DaNED](https://danlp-alexandra.readthedocs.io/en/stable/docs/datasets.html#daned) (Barrett, M. J., Lam, H., Wu, M., Lacroix, O., Plank, B., & Søgaard, A.)<br />[vesteinn/DanskBERT](https://huggingface.co/vesteinn/DanskBERT) (Vésteinn Snæbjarnarson) |
21
  | **License** | `Apache-2.0 License` |
22
+ | **Author** | [Kenneth Enevoldsen](https://chcaa.io/#/) |
23
 
24
  ### Label Scheme
25
 
26
  <details>
27
 
28
+ <summary>View label scheme (211 labels for 4 components)</summary>
29
 
30
  | Component | Labels |
31
  | --- | --- |
32
+ | **`tagger`** | `ADJ`, `ADP`, `ADV`, `AUX`, `CCONJ`, `DET`, `INTJ`, `NOUN`, `NUM`, `PART`, `PRON`, `PROPN`, `PUNCT`, `SCONJ`, `SYM`, `VERB`, `X` |
33
+ | **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `NumType=Ord\|POS=ADJ`, `POS=CCONJ`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Sup\|POS=ADV`, `Degree=Pos\|POS=ADV`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Number=Plur\|POS=DET\|PronType=Ind`, `POS=ADP`, `POS=ADV\|PartType=Inf`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=ADP\|PartType=Inf`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `NumType=Card\|POS=NUM`, `Degree=Pos\|POS=ADJ`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=PART\|PartType=Inf`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Number=Plur\|POS=PRON\|PronType=Ind`, `POS=INTJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Case=Gen\|POS=PROPN`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `POS=PRON\|PronType=Dem`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Number=Plur\|POS=NUM`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `POS=PRON`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Definite=Ind\|Number=Sing\|POS=NUM`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Foreign=Yes\|POS=ADV`, `POS=NOUN`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Degree=Sup\|POS=ADJ`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Mood=Imp\|POS=VERB`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `POS=X`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=VERB\|VerbForm=Ger`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Foreign=Yes\|POS=X`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=PRON\|PronType=Rcp`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `POS=SYM`, `POS=DET\|PronType=Dem`, `Gender=Com\|Number=Sing\|POS=NUM`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `POS=VERB\|Tense=Pres`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NUM`, `Degree=Abs\|POS=ADV`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|POS=NOUN`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NUM`, `Definite=Def\|Number=Plur\|POS=NOUN`, `Case=Gen\|POS=NOUN`, `POS=AUX\|Tense=Pres\|VerbForm=Part` |
34
+ | **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `advmod:lmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `punct`, `xcomp` |
35
  | **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
36
 
37
  </details>
 
40
 
41
  | Type | Score |
42
  | --- | --- |
43
+ | `TOKEN_ACC` | 99.92 |
44
+ | `TOKEN_P` | 99.70 |
45
+ | `TOKEN_R` | 99.77 |
46
+ | `TOKEN_F` | 99.74 |
47
+ | `SENTS_P` | 98.42 |
48
+ | `SENTS_R` | 99.29 |
49
+ | `SENTS_F` | 98.85 |
50
+ | `TAG_ACC` | 98.47 |
51
+ | `POS_ACC` | 98.57 |
52
+ | `MORPH_ACC` | 98.14 |
53
+ | `MORPH_MICRO_P` | 99.10 |
54
+ | `MORPH_MICRO_R` | 98.77 |
55
+ | `MORPH_MICRO_F` | 98.93 |
56
+ | `DEP_UAS` | 90.84 |
57
+ | `DEP_LAS` | 88.33 |
58
+ | `ENTS_P` | 87.08 |
59
+ | `ENTS_R` | 84.59 |
60
+ | `ENTS_F` | 85.82 |
61
+ | `COREF_LEA_F1` | 41.18 |
62
+ | `COREF_LEA_PRECISION` | 48.89 |
63
+ | `COREF_LEA_RECALL` | 35.58 |
64
+ | `NEL_SCORE` | 80.12 |
65
+ | `NEL_MICRO_P` | 99.23 |
66
+ | `NEL_MICRO_R` | 67.19 |
67
+ | `NEL_MICRO_F` | 80.12 |
68
+ | `NEL_MACRO_P` | 99.39 |
69
+ | `NEL_MACRO_R` | 65.99 |
70
+ | `NEL_MACRO_F` | 78.15 |
71
+
72
+
73
+
74
+ ### Training
75
+ This model was trained using [spaCy](https://spacy.io) and logged to [Weights & Biases](https://wandb.ai/kenevoldsen/dacy-v0.2.0). You can find all the training logs [here](https://wandb.ai/kenevoldsen/dacy-v0.2.0).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
attribute_ruler/patterns DELETED
@@ -1 +0,0 @@
1
-
 
 
config.cfg CHANGED
@@ -1,54 +1,101 @@
1
  [paths]
2
- train = "corpus/dane/train.spacy"
3
- dev = "corpus/dane/dev.spacy"
4
- vectors = null
5
- raw = null
6
  init_tok2vec = null
7
- vocab_data = null
 
8
 
9
  [system]
10
  gpu_allocator = "pytorch"
11
- seed = 1
12
 
13
  [nlp]
14
  lang = "da"
15
- pipeline = ["transformer","morphologizer","parser","attribute_ruler","lemmatizer","ner"]
 
16
  disabled = []
17
  before_creation = null
18
  after_creation = null
19
  after_pipeline_creation = null
20
- batch_size = 64
21
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
22
 
23
  [components]
24
 
25
- [components.attribute_ruler]
26
- factory = "attribute_ruler"
27
- validate = false
28
 
29
- [components.lemmatizer]
30
- factory = "lemmatizer"
31
- mode = "lookup"
32
- model = null
33
- overwrite = false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  [components.morphologizer]
36
  factory = "morphologizer"
 
 
 
37
 
38
  [components.morphologizer.model]
39
- @architectures = "spacy.Tagger.v1"
40
  nO = null
 
41
 
42
  [components.morphologizer.model.tok2vec]
43
  @architectures = "spacy-transformers.TransformerListener.v1"
44
  grad_factor = 1.0
45
- upstream = "transformer"
46
  pooling = {"@layers":"reduce_mean.v1"}
 
47
 
48
  [components.ner]
49
  factory = "ner"
50
  incorrect_spans_key = null
51
  moves = null
 
52
  update_with_oracle_cut_size = 100
53
 
54
  [components.ner.model]
@@ -63,96 +110,169 @@ nO = null
63
  [components.ner.model.tok2vec]
64
  @architectures = "spacy-transformers.TransformerListener.v1"
65
  grad_factor = 1.0
66
- upstream = "transformer"
67
  pooling = {"@layers":"reduce_mean.v1"}
 
68
 
69
  [components.parser]
70
  factory = "parser"
71
  learn_tokens = false
72
  min_action_freq = 30
73
  moves = null
 
74
  update_with_oracle_cut_size = 100
75
 
76
  [components.parser.model]
77
  @architectures = "spacy.TransitionBasedParser.v2"
78
  state_type = "parser"
79
  extra_state_tokens = false
80
- hidden_width = 64
81
- maxout_pieces = 2
82
  use_upper = false
83
  nO = null
84
 
85
  [components.parser.model.tok2vec]
86
  @architectures = "spacy-transformers.TransformerListener.v1"
87
  grad_factor = 1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  upstream = "transformer"
89
  pooling = {"@layers":"reduce_mean.v1"}
90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
  [components.transformer]
92
  factory = "transformer"
93
  max_batch_items = 4096
94
  set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
95
 
96
  [components.transformer.model]
97
- @architectures = "spacy-transformers.TransformerModel.v1"
98
- name = "Maltehb/danish-bert-botxo"
 
99
 
100
  [components.transformer.model.get_spans]
101
  @span_getters = "spacy-transformers.strided_spans.v1"
102
- window = 128
103
- stride = 96
 
 
104
 
105
  [components.transformer.model.tokenizer_config]
106
  use_fast = true
107
- strip_accents = false
 
108
 
109
  [corpora]
110
 
111
  [corpora.dev]
112
  @readers = "spacy.Corpus.v1"
113
- limit = 0
114
- max_length = 0
115
- path = ${paths:dev}
116
  gold_preproc = false
 
 
117
  augmenter = null
118
 
119
  [corpora.train]
120
  @readers = "spacy.Corpus.v1"
121
- path = ${paths:train}
122
- max_length = 500
123
  gold_preproc = false
 
124
  limit = 0
125
-
126
- [corpora.train.augmenter]
127
- @augmenters = "spacy.lower_case.v1"
128
- level = 0.1
129
 
130
  [training]
131
- train_corpus = "corpora.train"
132
- dev_corpus = "corpora.dev"
133
- seed = ${system:seed}
134
- gpu_allocator = ${system:gpu_allocator}
135
  dropout = 0.1
136
- accumulate_gradient = 3
137
- patience = 5000
138
  max_epochs = 0
139
  max_steps = 20000
140
- eval_frequency = 1000
141
  frozen_components = []
142
- before_to_disk = null
143
  annotating_components = []
 
 
 
 
144
 
145
  [training.batcher]
146
- @batchers = "spacy.batch_by_padded.v1"
147
- discard_oversize = true
 
148
  get_length = null
149
- size = 2000
150
- buffer = 256
 
 
 
 
 
151
 
152
  [training.logger]
153
- @loggers = "spacy.WandbLogger.v1"
154
- project_name = "dacy-an-efficient-pipeline-for-danish"
155
- remove_config_values = []
156
 
157
  [training.optimizer]
158
  @optimizers = "Adam.v1"
@@ -161,66 +281,44 @@ beta2 = 0.999
161
  L2_is_weight_decay = true
162
  L2 = 0.01
163
  grad_clip = 1.0
164
- use_averages = true
165
  eps = 0.00000001
166
-
167
- [training.optimizer.learn_rate]
168
- @schedules = "warmup_linear.v1"
169
- warmup_steps = 250
170
- total_steps = 20000
171
- initial_rate = 0.00005
172
 
173
  [training.score_weights]
174
- pos_acc = 0.08
175
- morph_acc = 0.08
 
176
  morph_per_feat = null
177
- dep_uas = 0.0
178
- dep_las = 0.16
 
179
  dep_las_per_type = null
180
  sents_p = null
181
  sents_r = null
182
- sents_f = 0.02
183
- lemma_acc = 0.5
184
- ents_f = 0.16
185
  ents_p = 0.0
186
  ents_r = 0.0
187
  ents_per_type = null
 
 
 
 
 
 
 
188
 
189
  [pretraining]
190
 
191
  [initialize]
192
- vocab_data = ${paths.vocab_data}
193
  vectors = ${paths.vectors}
194
  init_tok2vec = ${paths.init_tok2vec}
 
 
195
  before_init = null
196
  after_init = null
197
 
198
  [initialize.components]
199
 
200
- [initialize.components.morphologizer]
201
-
202
- [initialize.components.morphologizer.labels]
203
- @readers = "spacy.read_labels.v1"
204
- path = "corpus/labels/morphologizer.json"
205
- require = false
206
-
207
- [initialize.components.ner]
208
-
209
- [initialize.components.ner.labels]
210
- @readers = "spacy.read_labels.v1"
211
- path = "corpus/labels/ner.json"
212
- require = false
213
-
214
- [initialize.components.parser]
215
-
216
- [initialize.components.parser.labels]
217
- @readers = "spacy.read_labels.v1"
218
- path = "corpus/labels/parser.json"
219
- require = false
220
-
221
- [initialize.lookups]
222
- @misc = "spacy.LookupsDataLoader.v1"
223
- lang = ${nlp.lang}
224
- tables = ["lexeme_norm"]
225
-
226
  [initialize.tokenizer]
 
1
  [paths]
2
+ train = null
3
+ dev = null
 
 
4
  init_tok2vec = null
5
+ vectors = null
6
+ model_source = "training/da_dacy_medium_trf/model-last"
7
 
8
  [system]
9
  gpu_allocator = "pytorch"
10
+ seed = 0
11
 
12
  [nlp]
13
  lang = "da"
14
+ pipeline = ["transformer","tagger","morphologizer","trainable_lemmatizer","parser","ner","coref","span_resolver","span_cleaner","entity_linker"]
15
+ batch_size = 512
16
  disabled = []
17
  before_creation = null
18
  after_creation = null
19
  after_pipeline_creation = null
 
20
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
21
 
22
  [components]
23
 
24
+ [components.coref]
25
+ factory = "experimental_coref"
26
+ span_cluster_prefix = "coref_head_clusters"
27
 
28
+ [components.coref.model]
29
+ @architectures = "spacy-experimental.Coref.v1"
30
+ distance_embedding_size = 20
31
+ dropout = 0.3
32
+ hidden_size = 1024
33
+ depth = 2
34
+ antecedent_limit = 100
35
+ antecedent_batch_size = 512
36
+
37
+ [components.coref.model.tok2vec]
38
+ @architectures = "spacy-transformers.TransformerListener.v1"
39
+ grad_factor = 0.5
40
+ upstream = "transformer"
41
+ pooling = {"@layers":"reduce_mean.v1"}
42
+
43
+ [components.coref.scorer]
44
+ @scorers = "spacy-experimental.coref_scorer.v1"
45
+ span_cluster_prefix = "coref_head_clusters"
46
+
47
+ [components.entity_linker]
48
+ factory = "entity_linker"
49
+ candidates_batch_size = 1
50
+ entity_vector_length = 768
51
+ generate_empty_kb = {"@misc":"spacy.EmptyKB.v2"}
52
+ get_candidates = {"@misc":"spacy.CandidateGenerator.v1"}
53
+ get_candidates_batch = {"@misc":"spacy.CandidateBatchGenerator.v1"}
54
+ incl_context = true
55
+ incl_prior = true
56
+ labels_discard = []
57
+ n_sents = 0
58
+ overwrite = true
59
+ scorer = {"@scorers":"spacy.entity_linker_scorer.v1"}
60
+ threshold = null
61
+ use_gold_ents = true
62
+
63
+ [components.entity_linker.model]
64
+ @architectures = "spacy.EntityLinker.v2"
65
+ nO = null
66
+
67
+ [components.entity_linker.model.tok2vec]
68
+ @architectures = "spacy.HashEmbedCNN.v2"
69
+ pretrained_vectors = null
70
+ width = 96
71
+ depth = 2
72
+ embed_size = 2000
73
+ window_size = 1
74
+ maxout_pieces = 3
75
+ subword_features = true
76
 
77
  [components.morphologizer]
78
  factory = "morphologizer"
79
+ extend = false
80
+ overwrite = true
81
+ scorer = {"@scorers":"spacy.morphologizer_scorer.v1"}
82
 
83
  [components.morphologizer.model]
84
+ @architectures = "spacy.Tagger.v2"
85
  nO = null
86
+ normalize = false
87
 
88
  [components.morphologizer.model.tok2vec]
89
  @architectures = "spacy-transformers.TransformerListener.v1"
90
  grad_factor = 1.0
 
91
  pooling = {"@layers":"reduce_mean.v1"}
92
+ upstream = "transformer"
93
 
94
  [components.ner]
95
  factory = "ner"
96
  incorrect_spans_key = null
97
  moves = null
98
+ scorer = {"@scorers":"spacy.ner_scorer.v1"}
99
  update_with_oracle_cut_size = 100
100
 
101
  [components.ner.model]
 
110
  [components.ner.model.tok2vec]
111
  @architectures = "spacy-transformers.TransformerListener.v1"
112
  grad_factor = 1.0
 
113
  pooling = {"@layers":"reduce_mean.v1"}
114
+ upstream = "transformer"
115
 
116
  [components.parser]
117
  factory = "parser"
118
  learn_tokens = false
119
  min_action_freq = 30
120
  moves = null
121
+ scorer = {"@scorers":"spacy.parser_scorer.v1"}
122
  update_with_oracle_cut_size = 100
123
 
124
  [components.parser.model]
125
  @architectures = "spacy.TransitionBasedParser.v2"
126
  state_type = "parser"
127
  extra_state_tokens = false
128
+ hidden_width = 128
129
+ maxout_pieces = 3
130
  use_upper = false
131
  nO = null
132
 
133
  [components.parser.model.tok2vec]
134
  @architectures = "spacy-transformers.TransformerListener.v1"
135
  grad_factor = 1.0
136
+ pooling = {"@layers":"reduce_mean.v1"}
137
+ upstream = "transformer"
138
+
139
+ [components.span_cleaner]
140
+ factory = "experimental_span_cleaner"
141
+ prefix = "coref_head_clusters"
142
+
143
+ [components.span_resolver]
144
+ factory = "experimental_span_resolver"
145
+ input_prefix = "coref_head_clusters"
146
+ output_prefix = "coref_clusters"
147
+
148
+ [components.span_resolver.model]
149
+ @architectures = "spacy-experimental.SpanResolver.v1"
150
+ hidden_size = 1024
151
+ distance_embedding_size = 64
152
+ conv_channels = 4
153
+ window_size = 1
154
+ max_distance = 128
155
+ prefix = "coref_head_clusters"
156
+
157
+ [components.span_resolver.model.tok2vec]
158
+ @architectures = "spacy-transformers.TransformerListener.v1"
159
+ grad_factor = 0.0
160
  upstream = "transformer"
161
  pooling = {"@layers":"reduce_mean.v1"}
162
 
163
+ [components.span_resolver.scorer]
164
+ @scorers = "spacy-experimental.span_resolver_scorer.v1"
165
+ input_prefix = "coref_head_clusters"
166
+ output_prefix = "coref_clusters"
167
+
168
+ [components.tagger]
169
+ factory = "tagger"
170
+ neg_prefix = "!"
171
+ overwrite = false
172
+ scorer = {"@scorers":"spacy.tagger_scorer.v1"}
173
+
174
+ [components.tagger.model]
175
+ @architectures = "spacy.Tagger.v2"
176
+ nO = null
177
+ normalize = false
178
+
179
+ [components.tagger.model.tok2vec]
180
+ @architectures = "spacy-transformers.TransformerListener.v1"
181
+ grad_factor = 1.0
182
+ pooling = {"@layers":"reduce_mean.v1"}
183
+ upstream = "transformer"
184
+
185
+ [components.trainable_lemmatizer]
186
+ factory = "trainable_lemmatizer"
187
+ backoff = "orth"
188
+ min_tree_freq = 3
189
+ overwrite = false
190
+ scorer = {"@scorers":"spacy.lemmatizer_scorer.v1"}
191
+ top_k = 1
192
+
193
+ [components.trainable_lemmatizer.model]
194
+ @architectures = "spacy.Tagger.v2"
195
+ nO = null
196
+ normalize = false
197
+
198
+ [components.trainable_lemmatizer.model.tok2vec]
199
+ @architectures = "spacy-transformers.TransformerListener.v1"
200
+ grad_factor = 1.0
201
+ pooling = {"@layers":"reduce_mean.v1"}
202
+ upstream = "transformer"
203
+
204
  [components.transformer]
205
  factory = "transformer"
206
  max_batch_items = 4096
207
  set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
208
 
209
  [components.transformer.model]
210
+ @architectures = "spacy-transformers.TransformerModel.v3"
211
+ name = "vesteinn/DanskBERT"
212
+ mixed_precision = false
213
 
214
  [components.transformer.model.get_spans]
215
  @span_getters = "spacy-transformers.strided_spans.v1"
216
+ window = 400
217
+ stride = 350
218
+
219
+ [components.transformer.model.grad_scaler_config]
220
 
221
  [components.transformer.model.tokenizer_config]
222
  use_fast = true
223
+
224
+ [components.transformer.model.transformer_config]
225
 
226
  [corpora]
227
 
228
  [corpora.dev]
229
  @readers = "spacy.Corpus.v1"
230
+ path = ${paths.dev}
 
 
231
  gold_preproc = false
232
+ max_length = 0
233
+ limit = 0
234
  augmenter = null
235
 
236
  [corpora.train]
237
  @readers = "spacy.Corpus.v1"
238
+ path = ${paths.train}
 
239
  gold_preproc = false
240
+ max_length = 0
241
  limit = 0
242
+ augmenter = null
 
 
 
243
 
244
  [training]
245
+ seed = ${system.seed}
246
+ gpu_allocator = ${system.gpu_allocator}
 
 
247
  dropout = 0.1
248
+ accumulate_gradient = 1
249
+ patience = 1600
250
  max_epochs = 0
251
  max_steps = 20000
252
+ eval_frequency = 200
253
  frozen_components = []
 
254
  annotating_components = []
255
+ dev_corpus = "corpora.dev"
256
+ train_corpus = "corpora.train"
257
+ before_to_disk = null
258
+ before_update = null
259
 
260
  [training.batcher]
261
+ @batchers = "spacy.batch_by_words.v1"
262
+ discard_oversize = false
263
+ tolerance = 0.2
264
  get_length = null
265
+
266
+ [training.batcher.size]
267
+ @schedules = "compounding.v1"
268
+ start = 100
269
+ stop = 1000
270
+ compound = 1.001
271
+ t = 0.0
272
 
273
  [training.logger]
274
+ @loggers = "spacy.ConsoleLogger.v1"
275
+ progress_bar = false
 
276
 
277
  [training.optimizer]
278
  @optimizers = "Adam.v1"
 
281
  L2_is_weight_decay = true
282
  L2 = 0.01
283
  grad_clip = 1.0
284
+ use_averages = false
285
  eps = 0.00000001
286
+ learn_rate = 0.001
 
 
 
 
 
287
 
288
  [training.score_weights]
289
+ tag_acc = 0.12
290
+ pos_acc = 0.06
291
+ morph_acc = 0.06
292
  morph_per_feat = null
293
+ lemma_acc = 0.12
294
+ dep_uas = 0.06
295
+ dep_las = 0.06
296
  dep_las_per_type = null
297
  sents_p = null
298
  sents_r = null
299
+ sents_f = 0.0
300
+ ents_f = 0.12
 
301
  ents_p = 0.0
302
  ents_r = 0.0
303
  ents_per_type = null
304
+ coref_f = 0.12
305
+ coref_p = null
306
+ coref_r = null
307
+ span_accuracy = 0.12
308
+ nel_micro_f = 0.12
309
+ nel_micro_r = null
310
+ nel_micro_p = null
311
 
312
  [pretraining]
313
 
314
  [initialize]
 
315
  vectors = ${paths.vectors}
316
  init_tok2vec = ${paths.init_tok2vec}
317
+ vocab_data = null
318
+ lookups = null
319
  before_init = null
320
  after_init = null
321
 
322
  [initialize.components]
323
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
324
  [initialize.tokenizer]
coref/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "nI":768
3
+ }
lemmatizer/lookups/lookups.bin → coref/model RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6864ce8705293ba1b6dcf349ec133cdc33db3ba57f6e9337458cfe5073b6f103
3
- size 11537995
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be2b3ea7f99886e627799923f87a8de20ed5c4acaf1481f4d1840694c91dbf0a
3
+ size 34992476
da_dacy_medium_trf-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:18148d5a7c83d6842645d471d6ea6973c11b1e039bdc3a1a704f4c2c7c9ea7b4
3
- size 417787063
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1df8fffe95cf7461daed0f6c757ec11171096d5be627498818c652b1aff42bcb
3
+ size 490235375
entity_linker/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "overwrite":true
3
+ }
transformer/model/pytorch_model.bin → entity_linker/kb/contents RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f657cf1de5ca07b7c7940a3b91f6061a4b5bfafb8c27ba5bd96a853a3ccf4e1b
3
- size 442554327
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5249a8cbe7d241943bf2297681a7a140467506bd507922c2e47643cd64d2888c
3
+ size 8778224
entity_linker/kb/strings.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:732a4ef43858427e5728dbfe761a5a082708791b280e19b62a949dc86d94d43b
3
+ size 559069
entity_linker/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21879aaf49f428b07c5fb89dce1b90067dcc8e62b70c93a86522fd3461362ba6
3
+ size 3411574
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"da",
3
  "name":"dacy_medium_trf",
4
- "version":"0.1.0",
5
- "description":"\n<a href=\"https://github.com/centre-for-humanities-computing/Dacy\"><img src=\"https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png\" width=\"175\" height=\"175\" align=\"right\" /></a>\n\n# DaCy medium transformer\n\nDaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.\nDaCy's largest pipeline has achieved State-of-the-Art performance on Named entity recognition, part-of-speech tagging and dependency \nparsing for Danish on the DaNE dataset. Check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results. \nDaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.\n ",
6
- "author":"Centre for Humanities Computing Aarhus",
7
  "email":"[email protected]",
8
  "url":"https://chcaa.io/#/",
9
  "license":"Apache-2.0 License",
10
- "spacy_version":">=3.1.1,<3.2.0",
11
- "spacy_git_version":"ffaead8fe",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
@@ -18,6 +18,25 @@
18
  "labels":{
19
  "transformer":[
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ],
22
  "morphologizer":[
23
  "AdpType=Prep|POS=ADP",
@@ -34,155 +53,157 @@
34
  "Degree=Pos|Number=Plur|POS=ADJ",
35
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN",
36
  "POS=PUNCT",
 
37
  "POS=CCONJ",
38
- "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ",
39
- "Degree=Cmp|POS=ADJ",
40
- "POS=PRON|PartType=Inf",
41
- "Gender=Com|Number=Sing|POS=DET|PronType=Ind",
42
- "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ",
43
- "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs",
44
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN",
45
- "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ",
46
- "Gender=Neut|Number=Sing|POS=DET|PronType=Dem",
 
47
  "Degree=Pos|POS=ADV",
48
- "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part",
49
- "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN",
50
- "POS=PRON|PronType=Dem",
51
- "NumType=Card|POS=NUM",
52
- "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ",
53
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs",
54
- "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ",
55
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs",
56
- "NumType=Ord|POS=ADJ",
57
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
58
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act",
59
- "POS=VERB|VerbForm=Inf|Voice=Act",
 
60
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act",
61
- "POS=NOUN",
62
- "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass",
63
  "POS=ADP|PartType=Inf",
 
 
64
  "Degree=Pos|POS=ADJ",
 
 
 
65
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN",
66
- "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs",
 
 
 
67
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN",
 
 
68
  "POS=AUX|VerbForm=Inf|Voice=Act",
69
- "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ",
 
 
 
 
 
 
 
70
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem",
71
- "Number=Plur|POS=DET|PronType=Ind",
72
- "Gender=Com|Number=Sing|POS=PRON|PronType=Ind",
73
- "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes",
74
- "POS=PART|PartType=Inf",
 
 
 
 
 
 
 
 
 
75
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind",
76
- "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs",
77
- "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN",
78
- "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs",
 
 
 
 
 
 
79
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs",
80
- "Case=Nom|Gender=Com|POS=PRON|PronType=Ind",
81
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind",
82
- "Mood=Imp|POS=VERB",
83
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
84
- "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part",
85
- "POS=X",
 
 
 
 
 
 
 
86
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs",
 
 
 
 
 
 
 
 
 
 
87
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN",
88
- "POS=VERB|Tense=Pres|VerbForm=Part",
89
- "Number=Plur|POS=PRON|PronType=Int,Rel",
90
- "POS=VERB|VerbForm=Inf|Voice=Pass",
91
- "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN",
92
- "Degree=Cmp|POS=ADV",
93
- "POS=ADV|PartType=Inf",
94
- "Degree=Sup|POS=ADV",
95
  "Number=Plur|POS=PRON|PronType=Dem",
96
- "Number=Plur|POS=PRON|PronType=Ind",
97
- "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN",
98
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs",
99
- "Case=Gen|POS=PROPN",
100
- "POS=ADP",
101
  "Degree=Cmp|Number=Plur|POS=ADJ",
102
- "Definite=Def|Degree=Sup|POS=ADJ",
103
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
104
- "Degree=Pos|Number=Sing|POS=ADJ",
105
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
106
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  "Number=Plur|POS=PRON|PronType=Rcp",
 
108
  "Case=Gen|Degree=Cmp|POS=ADJ",
109
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN",
110
- "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs",
111
- "POS=INTJ",
112
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
113
- "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ",
114
- "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
115
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs",
116
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
117
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN",
118
- "Number=Sing|POS=PRON|PronType=Int,Rel",
119
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
120
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel",
121
- "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ",
122
- "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs",
123
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
124
- "Definite=Ind|Number=Sing|POS=NOUN",
125
- "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part",
126
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
127
- "POS=SYM",
128
- "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs",
129
- "Degree=Sup|POS=ADJ",
130
- "Number=Plur|POS=DET|PronType=Ind|Style=Arch",
131
- "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Dem",
132
- "Foreign=Yes|POS=X",
133
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs",
134
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem",
135
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs",
136
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN",
137
- "Case=Gen|POS=PRON|PronType=Int,Rel",
138
- "Gender=Com|Number=Sing|POS=PRON|PronType=Dem",
139
- "Abbr=Yes|POS=X",
140
- "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN",
141
  "Definite=Def|Degree=Abs|POS=ADJ",
142
- "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ",
143
- "Definite=Ind|POS=NOUN",
144
- "Gender=Com|Number=Plur|POS=NOUN",
145
- "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs",
146
- "Gender=Com|POS=PRON|PronType=Int,Rel",
147
- "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs",
148
  "Degree=Abs|POS=ADV",
149
- "POS=VERB|VerbForm=Ger",
150
- "POS=VERB|Tense=Past|VerbForm=Part",
151
- "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ",
152
- "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form",
153
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ",
154
- "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ",
155
- "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs",
156
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel",
157
- "POS=VERB|Tense=Pres",
158
- "Case=Gen|Number=Plur|POS=DET|PronType=Ind",
159
- "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs",
160
- "POS=PRON|Person=2|Polite=Form|Poss=Yes|PronType=Prs",
161
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
162
- "POS=AUX|Tense=Pres|VerbForm=Part",
163
- "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass",
164
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
165
- "Degree=Sup|Number=Plur|POS=ADJ",
166
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs",
167
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
168
- "Definite=Ind|Number=Plur|POS=NOUN",
169
- "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part",
170
- "Mood=Imp|POS=AUX",
171
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs",
172
- "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs",
173
- "Definite=Def|Gender=Com|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part",
174
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
 
 
175
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind",
 
 
176
  "Case=Gen|POS=NOUN",
177
- "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs",
178
- "POS=DET|PronType=Dem",
179
- "Definite=Def|Number=Plur|POS=NOUN"
180
  ],
181
  "parser":[
182
  "ROOT",
183
  "acl:relcl",
184
  "advcl",
185
  "advmod",
 
186
  "amod",
187
  "appos",
188
  "aux",
@@ -206,376 +227,458 @@
206
  "nummod",
207
  "obj",
208
  "obl",
209
- "obl:loc",
210
  "obl:tmod",
211
  "punct",
212
  "xcomp"
213
- ],
214
- "attribute_ruler":[
215
-
216
- ],
217
- "lemmatizer":[
218
-
219
  ],
220
  "ner":[
221
  "LOC",
222
  "MISC",
223
  "ORG",
224
  "PER"
 
 
 
 
 
 
 
 
 
225
  ]
226
  },
227
  "pipeline":[
228
  "transformer",
 
229
  "morphologizer",
 
230
  "parser",
231
- "attribute_ruler",
232
- "lemmatizer",
233
- "ner"
 
 
234
  ],
235
  "components":[
236
  "transformer",
 
237
  "morphologizer",
 
238
  "parser",
239
- "attribute_ruler",
240
- "lemmatizer",
241
- "ner"
 
 
242
  ],
243
  "disabled":[
244
 
245
  ],
246
- "_sourced_vectors_hashes":{
247
-
248
- },
 
249
  "performance":{
250
- "pos_acc":0.9744285161,
251
- "morph_acc":0.9723944208,
 
 
 
 
 
 
 
 
 
 
 
252
  "morph_per_feat":{
253
- "Mood":{
254
- "p":0.9942473634,
255
- "r":0.9885605338,
256
- "f":0.9913957935
257
- },
258
- "Tense":{
259
- "p":0.9841029523,
260
- "r":0.9789156627,
261
- "f":0.9815024538
262
  },
263
- "VerbForm":{
264
- "p":0.9852125693,
265
- "r":0.9785801714,
266
- "f":0.9818851704
267
  },
268
- "Voice":{
269
- "p":0.9947407964,
270
- "r":0.9895366218,
271
- "f":0.9921318846
272
  },
273
  "Definite":{
274
- "p":0.9879711307,
275
- "r":0.9735282497,
276
- "f":0.9806965174
277
  },
278
  "Gender":{
279
- "p":0.9828686597,
280
- "r":0.9724160851,
281
- "f":0.9776144337
282
  },
283
- "Number":{
284
- "p":0.986803906,
285
- "r":0.9752217006,
286
- "f":0.9809786173
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
287
  },
288
  "AdpType":{
289
- "p":0.9946714032,
290
- "r":0.9902740937,
291
- "f":0.9924678777
292
  },
293
- "PartType":{
294
- "p":1.0,
295
- "r":0.9967532468,
296
- "f":0.9983739837
297
  },
298
  "Case":{
299
- "p":0.9951923077,
300
- "r":0.981042654,
301
- "f":0.9880668258
302
  },
303
  "Person":{
304
- "p":0.9875,
305
- "r":0.9822380107,
306
- "f":0.9848619768
307
- },
308
- "PronType":{
309
- "p":0.9950413223,
310
- "r":0.9901315789,
311
- "f":0.9925803792
312
  },
313
- "NumType":{
314
- "p":0.9798657718,
315
- "r":0.9668874172,
316
- "f":0.9733333333
317
  },
318
- "Degree":{
319
- "p":0.9754901961,
320
- "r":0.9590361446,
321
- "f":0.9671931956
322
  },
323
- "Reflex":{
324
  "p":1.0,
325
- "r":1.0,
326
- "f":1.0
327
  },
328
  "Polite":{
329
- "p":0.0,
330
- "r":0.0,
331
- "f":0.0
332
- },
333
- "Number[psor]":{
334
- "p":0.9770114943,
335
- "r":0.988372093,
336
- "f":0.9826589595
337
  },
338
- "Poss":{
339
  "p":1.0,
340
- "r":0.9886363636,
341
- "f":0.9942857143
342
  },
343
  "Foreign":{
344
- "p":1.0,
345
- "r":0.4,
346
- "f":0.5714285714
347
- },
348
- "Abbr":{
349
- "p":1.0,
350
- "r":0.4,
351
- "f":0.5714285714
352
  },
353
  "Style":{
354
  "p":1.0,
355
  "r":1.0,
356
  "f":1.0
 
 
 
 
 
357
  }
358
  },
359
- "dep_uas":0.8714971531,
360
- "dep_las":0.8396963608,
361
  "dep_las_per_type":{
362
- "advmod":{
363
- "p":0.793006993,
364
- "r":0.8008474576,
365
- "f":0.796907941
366
  },
367
- "root":{
368
- "p":0.8450704225,
369
- "r":0.8510638298,
370
- "f":0.8480565371
371
  },
372
- "nsubj":{
373
- "p":0.9174603175,
374
- "r":0.914556962,
375
- "f":0.9160063391
376
  },
377
- "case":{
378
- "p":0.9192422732,
379
- "r":0.9110671937,
380
- "f":0.9151364764
381
  },
382
- "obl":{
383
- "p":0.7719568567,
384
- "r":0.7791601866,
385
- "f":0.7755417957
386
  },
387
  "cc":{
388
- "p":0.851744186,
389
- "r":0.851744186,
390
- "f":0.851744186
391
  },
392
  "conj":{
393
- "p":0.7320954907,
394
- "r":0.736,
395
- "f":0.7340425532
396
  },
397
- "obj":{
398
- "p":0.8736263736,
399
- "r":0.9262135922,
400
- "f":0.8991517436
 
 
 
 
 
 
 
 
 
 
401
  },
402
  "aux":{
403
- "p":0.8796561605,
404
- "r":0.8950437318,
405
- "f":0.887283237
406
  },
407
- "acl:relcl":{
408
- "p":0.729281768,
409
- "r":0.7135135135,
410
- "f":0.7213114754
411
  },
412
- "obl:loc":{
413
- "p":0.7285714286,
414
- "r":0.7285714286,
415
- "f":0.7285714286
416
  },
417
  "det":{
418
- "p":0.9339933993,
419
- "r":0.9324546952,
420
- "f":0.933223413
421
  },
422
- "amod":{
423
- "p":0.8799313894,
424
- "r":0.8754266212,
425
- "f":0.877673225
426
  },
427
  "nmod:poss":{
428
- "p":0.702970297,
429
- "r":0.702970297,
430
- "f":0.702970297
431
- },
432
- "ccomp":{
433
- "p":0.75,
434
- "r":0.7741935484,
435
- "f":0.7619047619
436
  },
437
- "nummod":{
438
- "p":0.808,
439
- "r":0.8416666667,
440
- "f":0.8244897959
441
  },
442
- "flat":{
443
- "p":0.8881578947,
444
- "r":0.8940397351,
445
- "f":0.8910891089
446
  },
447
- "compound:prt":{
448
- "p":0.7,
449
- "r":0.512195122,
450
- "f":0.5915492958
451
  },
452
  "advcl":{
453
- "p":0.7280701754,
454
- "r":0.7155172414,
455
- "f":0.7217391304
456
- },
457
- "mark":{
458
- "p":0.9145833333,
459
- "r":0.9014373717,
460
- "f":0.9079627715
461
- },
462
- "cop":{
463
- "p":0.8850574713,
464
- "r":0.88,
465
- "f":0.88252149
466
  },
467
  "dep":{
468
- "p":0.1855670103,
469
- "r":0.3396226415,
470
- "f":0.24
471
  },
472
- "nmod":{
473
- "p":0.7370600414,
474
- "r":0.6953125,
475
- "f":0.7155778894
 
 
 
 
 
476
  },
477
  "iobj":{
478
- "p":1.0,
479
- "r":0.6363636364,
480
- "f":0.7777777778
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
481
  },
482
  "xcomp":{
483
- "p":0.625,
484
- "r":0.4237288136,
485
- "f":0.5050505051
486
  },
487
- "appos":{
488
- "p":0.6486486486,
489
- "r":0.7272727273,
490
- "f":0.6857142857
491
  },
492
  "list":{
493
- "p":0.4,
 
 
 
 
 
494
  "r":0.3333333333,
495
- "f":0.3636363636
496
  },
497
- "vocative":{
498
  "p":0.0,
499
  "r":0.0,
500
  "f":0.0
501
  },
502
- "fixed":{
503
- "p":0.8947368421,
504
- "r":0.8095238095,
505
- "f":0.85
506
- },
507
- "expl":{
508
- "p":0.9117647059,
509
- "r":0.9117647059,
510
- "f":0.9117647059
511
  },
512
- "obl:tmod":{
513
- "p":0.8333333333,
514
- "r":0.5555555556,
515
- "f":0.6666666667
516
  },
517
  "discourse":{
518
  "p":0.0,
519
  "r":0.0,
520
  "f":0.0
 
 
 
 
 
 
 
 
 
 
521
  }
522
  },
523
- "sents_p":0.873015873,
524
- "sents_r":0.8776595745,
525
- "sents_f":0.875331565,
526
- "lemma_acc":0.8491041162,
527
- "ents_f":0.8178980229,
528
- "ents_p":0.817047817,
529
- "ents_r":0.81875,
530
  "ents_per_type":{
 
 
 
 
 
531
  "PER":{
532
- "p":0.896969697,
533
- "r":0.8915662651,
534
- "f":0.8942598187
535
  },
536
- "ORG":{
537
- "p":0.7228915663,
538
- "r":0.6666666667,
539
- "f":0.6936416185
540
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
541
  "MISC":{
542
- "p":0.7190082645,
543
- "r":0.7699115044,
544
- "f":0.7435897436
 
 
 
 
 
545
  },
546
  "LOC":{
547
- "p":0.875,
548
- "r":0.8828828829,
549
- "f":0.8789237668
 
 
 
 
 
550
  }
551
- },
552
- "transformer_loss":12243.0238996088,
553
- "morphologizer_loss":3888.699029932,
554
- "parser_loss":78618.0269862204,
555
- "ner_loss":685.0320498052
556
  },
557
  "sources":[
558
  {
559
- "name":"UD Danish DDT v2.5",
560
  "url":"https://github.com/UniversalDependencies/UD_Danish-DDT",
561
  "license":"CC BY-SA 4.0",
562
  "author":"Johannsen, Anders; Mart\u00ednez Alonso, H\u00e9ctor; Plank, Barbara"
563
  },
564
  {
565
  "name":"DaNE",
566
- "url":"https://github.com/alexandrainst/danlp/blob/master/docs/datasets.md#danish-dependency-treebank-dane",
567
  "license":"CC BY-SA 4.0",
568
  "author":"Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders S\u00f8gaard"
569
  },
570
  {
571
- "name":"Maltehb/danish-bert-botxo",
572
- "author":"BotXO.ai",
573
- "url":"https://huggingface.co/Maltehb/danish-bert-botxo",
574
- "license":"CC BY 4.0"
 
 
 
 
 
 
 
 
 
 
 
 
575
  }
576
  ],
577
- "requirements":[
578
- "spacy-transformers>=1.0.3,<1.1.0"
579
- ],
580
- "notes":"\n## Bias and Robustness\n\nBesides the validation done by SpaCy on the DaNE testset, DaCy also provides a series of augmentations to the DaNE test set to see how well the models deal with these types of augmentations.\nThe can be seen as behavioural probes akinn to the NLP checklist.\n\n### Deterministic Augmentations\nDeterministic augmentations are augmentation which always yield the same result.\n\n| Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) |\u00a0Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |\n| --- | --- | --- | --- | --- | --- | --- | --- |\n| No augmentation | 0.98 | 0.975 | 0.888 | 0.857 | 0.936 | 0.844 | 0.765 |\n| \u00c6\u00f8\u00e5 Augmentation | 0.963 | 0.955 | 0.88 | 0.844 | 0.944 | 0.754 | 0.712 |\n| Lowercase | 0.98 | 0.975 | 0.888 | 0.857 | 0.936 | 0.848 | 0.765 |\n| No Spacing | 0.229 | 0.229 | 0.004 | 0.004 | 0.683 | 0.225 | 0.058 |\n| Abbreviated first names | 0.976 | 0.974 | 0.885 | 0.854 | 0.934 | 0.845 | 0.741 |\n| Input size augmentation 5 sentences | 0.978 | 0.973 | 0.88 | 0.85 | 0.883 | 0.844 | 0.77 |\n| Input size augmentation 10 sentences | 0.977 | 0.973 | 0.878 | 0.847 | 0.872 | 0.844 | 0.768 |\n\n\n\n### Stochastic Augmentations\nStochastic augmentations are augmentation which are repeated mulitple times to estimate the effect of the augmentation.\n\n| Augmentation | Part-of-speech tagging (Accuracy) | Morphological tagging (Accuracy) | Dependency Parsing (UAS) | Dependency Parsing (LAS) |\u00a0Sentence segmentation (F1) | Lemmatization (Accuracy) | Named entity recognition (F1) |\n| --- | --- | --- | --- | --- | --- | --- | --- |\n| Keystroke errors 2% | 0.936 (0.002) | 0.934 (0.002) | 0.836 (0.002) | 0.795 (0.002) | 0.889 (0.002) | 0.773 (0.002) | 0.627 (0.002) |\n| Keystroke errors 5% | 0.869 (0.003) | 0.873 (0.003) | 0.753 (0.003) | 0.696 (0.003) | 0.829 (0.003) | 0.68 (0.003) | 0.487 (0.003) |\n| Keystroke errors 15% | 0.647 (0.007) | 0.684 (0.007) | 0.5 (0.007) | 0.417 (0.007) | 0.664 (0.007) | 0.46 (0.007) | 0.256 (0.007) |\n| Danish names | 0.978 (0.0) | 0.975 (0.0) | 0.885 (0.0) | 0.855 (0.0) | 0.934 (0.0) | 0.847 (0.0) | 0.771 (0.0) |\n| Muslim names | 0.978 (0.0) | 0.975 (0.0) | 0.886 (0.0) | 0.855 (0.0) | 0.935 (0.0) | 0.847 (0.0) | 0.749 (0.0) |\n| Female names | 0.979 (0.0) | 0.975 (0.0) | 0.886 (0.0) | 0.856 (0.0) | 0.933 (0.0) | 0.847 (0.0) | 0.775 (0.0) |\n| Male names | 0.978 (0.0) | 0.975 (0.0) | 0.885 (0.0) | 0.855 (0.0) | 0.933 (0.0) | 0.847 (0.0) | 0.773 (0.0) |\n| Spacing Augmention 5% | 0.941 (0.002) | 0.937 (0.002) | 0.78 (0.002) | 0.751 (0.002) | 0.905 (0.002) | 0.812 (0.002) | 0.701 (0.002) |\n\n<details>\n\n<summary> Description of Augmenters </summary>\n\n \n\n**No augmentation:**\nApplies no augmentation to the DaNE test set.\n\n**\u00c6\u00f8\u00e5 Augmentation:**\nThis augmentation replace the \u00e6,\u00f8, and \u00e5 with their spelling variations ae, oe and aa respectively.\n\n**Lowercase:**\nThis augmentation lowercases all text.\n\n**No Spacing:**\nThis augmentation removed all spacing from the text.\n\n**Abbreviated first names:**\nThis agmentation abbreviates the first names of entities. For instance 'Kenneth Enevoldsen' would turn to 'K. Enevoldsen'.\n\n**Keystroke errors 2%:**\nThis agmentation simulate keystroke errors by replacing 2% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Keystroke errors 5%:**\nThis agmentation simulate keystroke errors by replacing 5% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Keystroke errors 15%:**\nThis agmentation simulate keystroke errors by replacing 15% of keys with a neighbouring key on a Danish QWERTY keyboard. As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Danish names:**\nThis agmentation replace all names with Danish names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Muslim names:**\nThis agmentation replace all names with Muslim names derived from Meldgaard (2005). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Female names:**\nThis agmentation replace all names with Danish female names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Male names:**\nThis agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n\n**Spacing Augmention 5%:**\nThis agmentation replace all names with Danish male names derived from Danmarks Statistik (2021). As this agmentation is stochastic it is repeated 20 times to obtain a consistent estimate and the mean is provided with its standard deviation in parenthesis.\n </details> \n <br /> \n\n\n### Hardware\nThis was run an trained on a Quadro RTX 8000 GPU."
581
  }
 
1
  {
2
  "lang":"da",
3
  "name":"dacy_medium_trf",
4
+ "version":"0.2.0",
5
+ "description":"\n<a href=\"https://github.com/centre-for-humanities-computing/Dacy\"><img src=\"https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png\" width=\"175\" height=\"175\" align=\"right\" /></a>\n\n# DaCy medium\n\nDaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.\nDaCy's largest pipeline has achieved State-of-the-Art performance on parts-of-speech tagging and dependency \nparsing for Danish on the DaNE dataset. To read more check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results. \nDaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.\n",
6
+ "author":"Kenneth Enevoldsen",
7
  "email":"[email protected]",
8
  "url":"https://chcaa.io/#/",
9
  "license":"Apache-2.0 License",
10
+ "spacy_version":">=3.5.2,<3.6.0",
11
+ "spacy_git_version":"Unknown",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
 
18
  "labels":{
19
  "transformer":[
20
 
21
+ ],
22
+ "tagger":[
23
+ "ADJ",
24
+ "ADP",
25
+ "ADV",
26
+ "AUX",
27
+ "CCONJ",
28
+ "DET",
29
+ "INTJ",
30
+ "NOUN",
31
+ "NUM",
32
+ "PART",
33
+ "PRON",
34
+ "PROPN",
35
+ "PUNCT",
36
+ "SCONJ",
37
+ "SYM",
38
+ "VERB",
39
+ "X"
40
  ],
41
  "morphologizer":[
42
  "AdpType=Prep|POS=ADP",
 
53
  "Degree=Pos|Number=Plur|POS=ADJ",
54
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN",
55
  "POS=PUNCT",
56
+ "NumType=Ord|POS=ADJ",
57
  "POS=CCONJ",
 
 
 
 
 
 
58
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN",
59
+ "POS=VERB|VerbForm=Inf|Voice=Act",
60
+ "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs",
61
+ "Degree=Sup|POS=ADV",
62
  "Degree=Pos|POS=ADV",
63
+ "Gender=Com|Number=Sing|POS=DET|PronType=Ind",
64
+ "Number=Plur|POS=DET|PronType=Ind",
65
+ "POS=ADP",
66
+ "POS=ADV|PartType=Inf",
 
 
 
67
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs",
 
 
68
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act",
69
+ "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ",
70
+ "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs",
71
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act",
 
 
72
  "POS=ADP|PartType=Inf",
73
+ "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ",
74
+ "NumType=Card|POS=NUM",
75
  "Degree=Pos|POS=ADJ",
76
+ "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part",
77
+ "POS=PART|PartType=Inf",
78
+ "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes",
79
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN",
80
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN",
81
+ "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs",
82
+ "POS=VERB|Tense=Pres|VerbForm=Part",
83
+ "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs",
84
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN",
85
+ "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ",
86
+ "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs",
87
  "POS=AUX|VerbForm=Inf|Voice=Act",
88
+ "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ",
89
+ "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ",
90
+ "Degree=Cmp|POS=ADJ",
91
+ "POS=PRON|PartType=Inf",
92
+ "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ",
93
+ "Case=Nom|Gender=Com|POS=PRON|PronType=Ind",
94
+ "Number=Plur|POS=PRON|PronType=Ind",
95
+ "POS=INTJ",
96
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem",
97
+ "Case=Gen|Number=Plur|POS=DET|PronType=Ind",
98
+ "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass",
99
+ "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN",
100
+ "Degree=Cmp|POS=ADV",
101
+ "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form",
102
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs",
103
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
104
+ "Case=Gen|POS=PROPN",
105
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind",
106
+ "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part",
107
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
108
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs",
109
+ "Definite=Def|Degree=Sup|POS=ADJ",
110
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind",
111
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN",
112
+ "Gender=Neut|Number=Sing|POS=DET|PronType=Dem",
113
+ "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part",
114
+ "POS=PRON|PronType=Dem",
115
+ "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ",
116
+ "Number=Plur|POS=NUM",
117
+ "POS=VERB|VerbForm=Inf|Voice=Pass",
118
+ "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ",
119
+ "Number=Sing|POS=PRON|PronType=Int,Rel",
120
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs",
121
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
 
 
122
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
123
+ "POS=PRON",
124
+ "Definite=Ind|Number=Sing|POS=NOUN",
125
+ "Definite=Ind|Number=Sing|POS=NUM",
126
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN",
127
+ "Foreign=Yes|POS=ADV",
128
+ "POS=NOUN",
129
+ "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN",
130
+ "Gender=Com|Number=Plur|POS=NOUN",
131
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel",
132
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs",
133
+ "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs",
134
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Ind",
135
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN",
136
+ "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ",
137
+ "Degree=Sup|POS=ADJ",
138
+ "Degree=Pos|Number=Sing|POS=ADJ",
139
+ "Mood=Imp|POS=VERB",
140
+ "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs",
141
+ "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs",
142
+ "POS=X",
143
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN",
 
 
 
 
 
 
 
144
  "Number=Plur|POS=PRON|PronType=Dem",
145
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs",
146
+ "Number=Plur|POS=PRON|PronType=Int,Rel",
147
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
 
 
148
  "Degree=Cmp|Number=Plur|POS=ADJ",
149
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs",
 
 
 
150
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
151
+ "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs",
152
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs",
153
+ "Gender=Com|POS=PRON|PronType=Int,Rel",
154
+ "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ",
155
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
156
+ "POS=VERB|VerbForm=Ger",
157
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Dem",
158
+ "Case=Gen|POS=PRON|PronType=Int,Rel",
159
+ "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass",
160
+ "Abbr=Yes|POS=X",
161
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN",
162
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
163
+ "Definite=Ind|Number=Plur|POS=NOUN",
164
+ "Foreign=Yes|POS=X",
165
  "Number=Plur|POS=PRON|PronType=Rcp",
166
+ "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs",
167
  "Case=Gen|Degree=Cmp|POS=ADJ",
168
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN",
169
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs",
170
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem",
 
 
 
 
 
 
 
171
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
172
+ "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form",
 
 
 
 
 
173
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
174
+ "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs",
175
+ "Case=Gen|Number=Plur|POS=PRON|PronType=Rcp",
 
 
 
 
176
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs",
177
+ "POS=SYM",
178
+ "POS=DET|PronType=Dem",
179
+ "Gender=Com|Number=Sing|POS=NUM",
180
+ "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs",
181
+ "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part",
 
 
182
  "Definite=Def|Degree=Abs|POS=ADJ",
183
+ "POS=VERB|Tense=Pres",
184
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NUM",
 
 
 
 
185
  "Degree=Abs|POS=ADV",
 
 
 
 
186
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ",
 
 
187
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel",
188
+ "POS=VERB|Tense=Past|VerbForm=Part",
189
+ "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ",
 
 
190
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
 
 
 
 
 
 
 
 
 
191
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs",
 
 
192
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs",
193
+ "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs",
194
+ "Definite=Ind|POS=NOUN",
195
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind",
196
+ "Definite=Ind|Gender=Com|Number=Sing|POS=NUM",
197
+ "Definite=Def|Number=Plur|POS=NOUN",
198
  "Case=Gen|POS=NOUN",
199
+ "POS=AUX|Tense=Pres|VerbForm=Part"
 
 
200
  ],
201
  "parser":[
202
  "ROOT",
203
  "acl:relcl",
204
  "advcl",
205
  "advmod",
206
+ "advmod:lmod",
207
  "amod",
208
  "appos",
209
  "aux",
 
227
  "nummod",
228
  "obj",
229
  "obl",
230
+ "obl:lmod",
231
  "obl:tmod",
232
  "punct",
233
  "xcomp"
 
 
 
 
 
 
234
  ],
235
  "ner":[
236
  "LOC",
237
  "MISC",
238
  "ORG",
239
  "PER"
240
+ ],
241
+ "coref":[
242
+
243
+ ],
244
+ "span_resolver":[
245
+
246
+ ],
247
+ "entity_linker":[
248
+
249
  ]
250
  },
251
  "pipeline":[
252
  "transformer",
253
+ "tagger",
254
  "morphologizer",
255
+ "trainable_lemmatizer",
256
  "parser",
257
+ "ner",
258
+ "coref",
259
+ "span_resolver",
260
+ "span_cleaner",
261
+ "entity_linker"
262
  ],
263
  "components":[
264
  "transformer",
265
+ "tagger",
266
  "morphologizer",
267
+ "trainable_lemmatizer",
268
  "parser",
269
+ "ner",
270
+ "coref",
271
+ "span_resolver",
272
+ "span_cleaner",
273
+ "entity_linker"
274
  ],
275
  "disabled":[
276
 
277
  ],
278
+ "requirements":[
279
+ "spacy-transformers>=1.2.3,<1.3.0",
280
+ "spacy-experimental>=0.6.2,<0.7.0"
281
+ ],
282
  "performance":{
283
+ "token_acc":0.9992023928,
284
+ "token_p":0.9970089731,
285
+ "token_r":0.9977052779,
286
+ "token_f":0.9973570039,
287
+ "sents_p":0.9842105263,
288
+ "sents_r":0.992920354,
289
+ "sents_f":0.9885462555,
290
+ "tag_acc":0.9847290149,
291
+ "pos_acc":0.985677928,
292
+ "morph_acc":0.9814371257,
293
+ "morph_micro_p":0.9910058542,
294
+ "morph_micro_r":0.9876942662,
295
+ "morph_micro_f":0.989347289,
296
  "morph_per_feat":{
297
+ "NumType":{
298
+ "p":0.987654321,
299
+ "r":0.9302325581,
300
+ "f":0.9580838323
 
 
 
 
 
301
  },
302
+ "Degree":{
303
+ "p":0.9894736842,
304
+ "r":0.9715762274,
305
+ "f":0.9804432855
306
  },
307
+ "Number":{
308
+ "p":0.9884148064,
309
+ "r":0.9859075536,
310
+ "f":0.987159588
311
  },
312
  "Definite":{
313
+ "p":0.9858490566,
314
+ "r":0.9837398374,
315
+ "f":0.9847933176
316
  },
317
  "Gender":{
318
+ "p":0.9869901547,
319
+ "r":0.9838766211,
320
+ "f":0.9854309286
321
  },
322
+ "Mood":{
323
+ "p":0.9971126083,
324
+ "r":0.9942418426,
325
+ "f":0.9956751562
326
+ },
327
+ "Tense":{
328
+ "p":0.9906469213,
329
+ "r":0.9906469213,
330
+ "f":0.9906469213
331
+ },
332
+ "VerbForm":{
333
+ "p":0.9924670433,
334
+ "r":0.9918444166,
335
+ "f":0.9921556323
336
+ },
337
+ "Voice":{
338
+ "p":0.997012696,
339
+ "r":0.9955257271,
340
+ "f":0.9962686567
341
  },
342
  "AdpType":{
343
+ "p":0.9990689013,
344
+ "r":0.9972118959,
345
+ "f":0.9981395349
346
  },
347
+ "PronType":{
348
+ "p":0.9954914337,
349
+ "r":0.9963898917,
350
+ "f":0.9959404601
351
  },
352
  "Case":{
353
+ "p":0.9968652038,
354
+ "r":0.9860465116,
355
+ "f":0.9914263445
356
  },
357
  "Person":{
358
+ "p":0.9930555556,
359
+ "r":0.9913344887,
360
+ "f":0.9921942758
 
 
 
 
 
361
  },
362
+ "Number[psor]":{
363
+ "p":0.987804878,
364
+ "r":1.0,
365
+ "f":0.9938650307
366
  },
367
+ "Poss":{
368
+ "p":0.987804878,
369
+ "r":1.0,
370
+ "f":0.9938650307
371
  },
372
+ "PartType":{
373
  "p":1.0,
374
+ "r":0.9962406015,
375
+ "f":0.9981167608
376
  },
377
  "Polite":{
378
+ "p":0.6666666667,
379
+ "r":0.6666666667,
380
+ "f":0.6666666667
 
 
 
 
 
381
  },
382
+ "Reflex":{
383
  "p":1.0,
384
+ "r":1.0,
385
+ "f":1.0
386
  },
387
  "Foreign":{
388
+ "p":0.5,
389
+ "r":0.2,
390
+ "f":0.2857142857
 
 
 
 
 
391
  },
392
  "Style":{
393
  "p":1.0,
394
  "r":1.0,
395
  "f":1.0
396
+ },
397
+ "Abbr":{
398
+ "p":0.6666666667,
399
+ "r":1.0,
400
+ "f":0.8
401
  }
402
  },
403
+ "dep_uas":0.9083920564,
404
+ "dep_las":0.883349834,
405
  "dep_las_per_type":{
406
+ "nummod":{
407
+ "p":0.7948717949,
408
+ "r":0.8230088496,
409
+ "f":0.8086956522
410
  },
411
+ "amod":{
412
+ "p":0.897810219,
413
+ "r":0.9027522936,
414
+ "f":0.9002744739
415
  },
416
+ "nmod":{
417
+ "p":0.7712418301,
418
+ "r":0.7729257642,
419
+ "f":0.772082879
420
  },
421
+ "nsubj":{
422
+ "p":0.9510638298,
423
+ "r":0.946031746,
424
+ "f":0.9485411141
425
  },
426
+ "flat":{
427
+ "p":0.9285714286,
428
+ "r":0.9680851064,
429
+ "f":0.9479166667
430
  },
431
  "cc":{
432
+ "p":0.8681672026,
433
+ "r":0.8940397351,
434
+ "f":0.88091354
435
  },
436
  "conj":{
437
+ "p":0.8862275449,
438
+ "r":0.8554913295,
439
+ "f":0.8705882353
440
  },
441
+ "root":{
442
+ "p":0.926056338,
443
+ "r":0.9309734513,
444
+ "f":0.9285083848
445
+ },
446
+ "advmod":{
447
+ "p":0.8871715611,
448
+ "r":0.8605697151,
449
+ "f":0.8736681887
450
+ },
451
+ "mark":{
452
+ "p":0.9148471616,
453
+ "r":0.9331848552,
454
+ "f":0.9239250276
455
  },
456
  "aux":{
457
+ "p":0.9875389408,
458
+ "r":0.9753846154,
459
+ "f":0.9814241486
460
  },
461
+ "ccomp":{
462
+ "p":0.7764705882,
463
+ "r":0.835443038,
464
+ "f":0.8048780488
465
  },
466
+ "case":{
467
+ "p":0.9348986126,
468
+ "r":0.9192025184,
469
+ "f":0.926984127
470
  },
471
  "det":{
472
+ "p":0.9409448819,
473
+ "r":0.9637096774,
474
+ "f":0.9521912351
475
  },
476
+ "obl":{
477
+ "p":0.8476821192,
478
+ "r":0.8114104596,
479
+ "f":0.8291497976
480
  },
481
  "nmod:poss":{
482
+ "p":0.8181818182,
483
+ "r":0.8256880734,
484
+ "f":0.8219178082
 
 
 
 
 
485
  },
486
+ "obj":{
487
+ "p":0.8943533698,
488
+ "r":0.9352380952,
489
+ "f":0.9143389199
490
  },
491
+ "cop":{
492
+ "p":0.8944099379,
493
+ "r":0.8834355828,
494
+ "f":0.8888888889
495
  },
496
+ "acl:relcl":{
497
+ "p":0.8343195266,
498
+ "r":0.7704918033,
499
+ "f":0.8011363636
500
  },
501
  "advcl":{
502
+ "p":0.6742857143,
503
+ "r":0.7564102564,
504
+ "f":0.7129909366
 
 
 
 
 
 
 
 
 
 
505
  },
506
  "dep":{
507
+ "p":0.1136363636,
508
+ "r":0.3333333333,
509
+ "f":0.1694915254
510
  },
511
+ "compound:prt":{
512
+ "p":0.6666666667,
513
+ "r":0.5882352941,
514
+ "f":0.625
515
+ },
516
+ "fixed":{
517
+ "p":0.9473684211,
518
+ "r":0.8709677419,
519
+ "f":0.9075630252
520
  },
521
  "iobj":{
522
+ "p":0.7692307692,
523
+ "r":0.6666666667,
524
+ "f":0.7142857143
525
+ },
526
+ "appos":{
527
+ "p":0.8181818182,
528
+ "r":0.7105263158,
529
+ "f":0.7605633803
530
+ },
531
+ "obl:tmod":{
532
+ "p":0.5,
533
+ "r":0.3125,
534
+ "f":0.3846153846
535
+ },
536
+ "advmod:lmod":{
537
+ "p":0.7678571429,
538
+ "r":0.8958333333,
539
+ "f":0.8269230769
540
  },
541
  "xcomp":{
542
+ "p":0.8913043478,
543
+ "r":0.640625,
544
+ "f":0.7454545455
545
  },
546
+ "expl":{
547
+ "p":0.9230769231,
548
+ "r":0.9230769231,
549
+ "f":0.9230769231
550
  },
551
  "list":{
552
+ "p":0.5714285714,
553
+ "r":0.2352941176,
554
+ "f":0.3333333333
555
+ },
556
+ "obl:lmod":{
557
+ "p":0.25,
558
  "r":0.3333333333,
559
+ "f":0.2857142857
560
  },
561
+ "parataxis":{
562
  "p":0.0,
563
  "r":0.0,
564
  "f":0.0
565
  },
566
+ "orphan":{
567
+ "p":0.0,
568
+ "r":0.0,
569
+ "f":0.0
 
 
 
 
 
570
  },
571
+ "vocative":{
572
+ "p":0.0,
573
+ "r":0.0,
574
+ "f":0.0
575
  },
576
  "discourse":{
577
  "p":0.0,
578
  "r":0.0,
579
  "f":0.0
580
+ },
581
+ "dislocated":{
582
+ "p":0.0,
583
+ "r":0.0,
584
+ "f":0.0
585
+ },
586
+ "compound":{
587
+ "p":0.0,
588
+ "r":0.0,
589
+ "f":0.0
590
  }
591
  },
592
+ "ents_p":0.8708487085,
593
+ "ents_r":0.8458781362,
594
+ "ents_f":0.8581818182,
 
 
 
 
595
  "ents_per_type":{
596
+ "LOC":{
597
+ "p":0.854368932,
598
+ "r":0.9166666667,
599
+ "f":0.8844221106
600
+ },
601
  "PER":{
602
+ "p":0.9100529101,
603
+ "r":0.9555555556,
604
+ "f":0.9322493225
605
  },
606
+ "MISC":{
607
+ "p":0.8301886792,
608
+ "r":0.7272727273,
609
+ "f":0.7753303965
610
  },
611
+ "ORG":{
612
+ "p":0.8611111111,
613
+ "r":0.7701863354,
614
+ "f":0.8131147541
615
+ }
616
+ },
617
+ "coref_lea_f1":0.4118366346,
618
+ "coref_lea_precision":0.4889169083,
619
+ "coref_lea_recall":0.3557507008,
620
+ "nel_score":0.801242236,
621
+ "nel_score_desc":"micro F",
622
+ "nel_micro_p":0.9923076923,
623
+ "nel_micro_r":0.671875,
624
+ "nel_micro_f":0.801242236,
625
+ "nel_macro_p":0.993902439,
626
+ "nel_macro_r":0.6598989464,
627
+ "nel_macro_f":0.7815238616,
628
+ "nel_f_per_type":{
629
  "MISC":{
630
+ "p":1.0,
631
+ "r":0.4117647059,
632
+ "f":0.5833333333
633
+ },
634
+ "PER":{
635
+ "p":1.0,
636
+ "r":0.7540983607,
637
+ "f":0.8598130841
638
  },
639
  "LOC":{
640
+ "p":1.0,
641
+ "r":0.8285714286,
642
+ "f":0.90625
643
+ },
644
+ "ORG":{
645
+ "p":0.9756097561,
646
+ "r":0.6451612903,
647
+ "f":0.7766990291
648
  }
649
+ }
 
 
 
 
650
  },
651
  "sources":[
652
  {
653
+ "name":"UD Danish DDT v2.11",
654
  "url":"https://github.com/UniversalDependencies/UD_Danish-DDT",
655
  "license":"CC BY-SA 4.0",
656
  "author":"Johannsen, Anders; Mart\u00ednez Alonso, H\u00e9ctor; Plank, Barbara"
657
  },
658
  {
659
  "name":"DaNE",
660
+ "url":"https://huggingface.co/datasets/dane",
661
  "license":"CC BY-SA 4.0",
662
  "author":"Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders S\u00f8gaard"
663
  },
664
  {
665
+ "name":"DaCoref",
666
+ "url":"https://huggingface.co/datasets/alexandrainst/dacoref",
667
+ "license":"CC BY-SA 4.0",
668
+ "author":"Buch-Kromann, Matthias"
669
+ },
670
+ {
671
+ "name":"DaNED",
672
+ "url":"https://danlp-alexandra.readthedocs.io/en/stable/docs/datasets.html#daned",
673
+ "license":"CC BY-SA 4.0",
674
+ "author":"Barrett, M. J., Lam, H., Wu, M., Lacroix, O., Plank, B., & S\u00f8gaard, A."
675
+ },
676
+ {
677
+ "name":"vesteinn/DanskBERT",
678
+ "author":"V\u00e9steinn Sn\u00e6bjarnarson",
679
+ "url":"https://huggingface.co/vesteinn/DanskBERT",
680
+ "license":"MIT"
681
  }
682
  ],
683
+ "notes":"\n\n### Training\nThis model was trained using [spaCy](https://spacy.io) and logged to [Weights & Biases](https://wandb.ai/kenevoldsen/dacy-v0.2.0). You can find all the training logs [here](https://wandb.ai/kenevoldsen/dacy-v0.2.0)."
 
 
 
684
  }
morphologizer/cfg CHANGED
@@ -1,4 +1,5 @@
1
  {
 
2
  "labels_morph":{
3
  "AdpType=Prep|POS=ADP":"AdpType=Prep",
4
  "Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":"Definite=Ind|Gender=Com|Number=Sing",
@@ -14,149 +15,150 @@
14
  "Degree=Pos|Number=Plur|POS=ADJ":"Degree=Pos|Number=Plur",
15
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":"Definite=Ind|Gender=Com|Number=Plur",
16
  "POS=PUNCT":"",
 
17
  "POS=CCONJ":"",
18
- "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Cmp|Number=Sing",
19
- "Degree=Cmp|POS=ADJ":"Degree=Cmp",
20
- "POS=PRON|PartType=Inf":"PartType=Inf",
21
- "Gender=Com|Number=Sing|POS=DET|PronType=Ind":"Gender=Com|Number=Sing|PronType=Ind",
22
- "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Number=Sing",
23
- "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs",
24
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":"Definite=Ind|Gender=Neut|Number=Plur",
25
- "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":"Definite=Def|Degree=Pos|Number=Sing",
26
- "Gender=Neut|Number=Sing|POS=DET|PronType=Dem":"Gender=Neut|Number=Sing|PronType=Dem",
 
27
  "Degree=Pos|POS=ADV":"Degree=Pos",
28
- "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":"Definite=Def|Number=Sing|Tense=Past|VerbForm=Part",
29
- "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":"Definite=Ind|Gender=Neut|Number=Sing",
30
- "POS=PRON|PronType=Dem":"PronType=Dem",
31
- "NumType=Card|POS=NUM":"NumType=Card",
32
- "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing",
33
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs",
34
- "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":"Degree=Pos|Gender=Com|Number=Sing",
35
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=3|PronType=Prs",
36
- "NumType=Ord|POS=ADJ":"NumType=Ord",
37
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
38
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Act",
39
- "POS=VERB|VerbForm=Inf|Voice=Act":"VerbForm=Inf|Voice=Act",
 
40
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Act",
41
- "POS=NOUN":"",
42
- "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass":"Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Pass",
43
  "POS=ADP|PartType=Inf":"PartType=Inf",
 
 
44
  "Degree=Pos|POS=ADJ":"Degree=Pos",
 
 
 
45
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN":"Definite=Def|Gender=Com|Number=Plur",
46
- "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs",
 
 
 
47
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN":"Case=Gen|Definite=Def|Gender=Com|Number=Sing",
 
 
48
  "POS=AUX|VerbForm=Inf|Voice=Act":"VerbForm=Inf|Voice=Act",
49
- "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Gender=Com|Number=Sing",
 
 
 
 
 
 
 
50
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem":"Gender=Com|Number=Sing|PronType=Dem",
51
- "Number=Plur|POS=DET|PronType=Ind":"Number=Plur|PronType=Ind",
52
- "Gender=Com|Number=Sing|POS=PRON|PronType=Ind":"Gender=Com|Number=Sing|PronType=Ind",
53
- "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes":"Case=Acc|Person=3|PronType=Prs|Reflex=Yes",
54
- "POS=PART|PartType=Inf":"PartType=Inf",
 
 
 
 
 
 
 
 
 
55
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind":"Gender=Neut|Number=Sing|PronType=Ind",
56
- "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Number=Plur|Person=3|PronType=Prs",
57
- "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN":"Case=Gen|Definite=Def|Gender=Neut|Number=Sing",
58
- "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs":"Case=Nom|Number=Plur|Person=3|PronType=Prs",
 
 
 
 
 
 
59
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=1|PronType=Prs",
60
- "Case=Nom|Gender=Com|POS=PRON|PronType=Ind":"Case=Nom|Gender=Com|PronType=Ind",
61
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind":"Gender=Neut|Number=Sing|PronType=Ind",
62
- "Mood=Imp|POS=VERB":"Mood=Imp",
63
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
64
- "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part":"Definite=Ind|Number=Sing|Tense=Past|VerbForm=Part",
65
- "POS=X":"",
 
 
 
 
 
 
 
66
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":"Case=Nom|Gender=Com|Number=Plur|Person=1|PronType=Prs",
 
 
 
 
 
 
 
 
 
 
67
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN":"Case=Gen|Definite=Def|Gender=Com|Number=Plur",
68
- "POS=VERB|Tense=Pres|VerbForm=Part":"Tense=Pres|VerbForm=Part",
69
- "Number=Plur|POS=PRON|PronType=Int,Rel":"Number=Plur|PronType=Int,Rel",
70
- "POS=VERB|VerbForm=Inf|Voice=Pass":"VerbForm=Inf|Voice=Pass",
71
- "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Com|Number=Sing",
72
- "Degree=Cmp|POS=ADV":"Degree=Cmp",
73
- "POS=ADV|PartType=Inf":"PartType=Inf",
74
- "Degree=Sup|POS=ADV":"Degree=Sup",
75
  "Number=Plur|POS=PRON|PronType=Dem":"Number=Plur|PronType=Dem",
76
- "Number=Plur|POS=PRON|PronType=Ind":"Number=Plur|PronType=Ind",
77
- "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":"Definite=Def|Gender=Neut|Number=Plur",
78
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=1|PronType=Prs",
79
- "Case=Gen|POS=PROPN":"Case=Gen",
80
- "POS=ADP":"",
81
  "Degree=Cmp|Number=Plur|POS=ADJ":"Degree=Cmp|Number=Plur",
82
- "Definite=Def|Degree=Sup|POS=ADJ":"Definite=Def|Degree=Sup",
83
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
84
- "Degree=Pos|Number=Sing|POS=ADJ":"Degree=Pos|Number=Sing",
85
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Number=Plur|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
86
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Gender=Com|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
  "Number=Plur|POS=PRON|PronType=Rcp":"Number=Plur|PronType=Rcp",
 
88
  "Case=Gen|Degree=Cmp|POS=ADJ":"Case=Gen|Degree=Cmp",
89
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":"Case=Gen|Definite=Def|Gender=Neut|Number=Plur",
90
- "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs",
91
- "POS=INTJ":"",
92
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Number=Plur|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
93
- "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":"Degree=Pos|Gender=Neut|Number=Sing",
94
- "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Gender=Neut|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
95
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=2|PronType=Prs",
96
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
97
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Neut|Number=Plur",
98
- "Number=Sing|POS=PRON|PronType=Int,Rel":"Number=Sing|PronType=Int,Rel",
99
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Number=Plur|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
100
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel":"Gender=Neut|Number=Sing|PronType=Int,Rel",
101
- "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ":"Definite=Def|Degree=Sup|Number=Plur",
102
- "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=2|PronType=Prs",
103
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
104
- "Definite=Ind|Number=Sing|POS=NOUN":"Definite=Ind|Number=Sing",
105
- "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":"Number=Plur|Tense=Past|VerbForm=Part",
106
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Number=Plur|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
107
- "POS=SYM":"",
108
- "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":"Case=Nom|Gender=Com|Person=2|Polite=Form|PronType=Prs",
109
- "Degree=Sup|POS=ADJ":"Degree=Sup",
110
- "Number=Plur|POS=DET|PronType=Ind|Style=Arch":"Number=Plur|PronType=Ind|Style=Arch",
111
- "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Dem":"Case=Gen|Gender=Com|Number=Sing|PronType=Dem",
112
- "Foreign=Yes|POS=X":"Foreign=Yes",
113
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs":"Person=2|Polite=Form|Poss=Yes|PronType=Prs",
114
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem":"Gender=Neut|Number=Sing|PronType=Dem",
115
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":"Case=Acc|Gender=Com|Number=Plur|Person=1|PronType=Prs",
116
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Neut|Number=Sing",
117
- "Case=Gen|POS=PRON|PronType=Int,Rel":"Case=Gen|PronType=Int,Rel",
118
- "Gender=Com|Number=Sing|POS=PRON|PronType=Dem":"Gender=Com|Number=Sing|PronType=Dem",
119
- "Abbr=Yes|POS=X":"Abbr=Yes",
120
- "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Com|Number=Plur",
121
  "Definite=Def|Degree=Abs|POS=ADJ":"Definite=Def|Degree=Abs",
122
- "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Sup|Number=Sing",
123
- "Definite=Ind|POS=NOUN":"Definite=Ind",
124
- "Gender=Com|Number=Plur|POS=NOUN":"Gender=Com|Number=Plur",
125
- "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs",
126
- "Gender=Com|POS=PRON|PronType=Int,Rel":"Gender=Com|PronType=Int,Rel",
127
- "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":"Case=Nom|Gender=Com|Number=Plur|Person=2|PronType=Prs",
128
  "Degree=Abs|POS=ADV":"Degree=Abs",
129
- "POS=VERB|VerbForm=Ger":"VerbForm=Ger",
130
- "POS=VERB|Tense=Past|VerbForm=Part":"Tense=Past|VerbForm=Part",
131
- "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ":"Definite=Def|Degree=Sup|Number=Sing",
132
- "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Number=Plur|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
133
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":"Case=Gen|Definite=Def|Degree=Pos|Number=Sing",
134
- "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ":"Case=Gen|Degree=Pos|Number=Plur",
135
- "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":"Case=Acc|Gender=Com|Person=2|Polite=Form|PronType=Prs",
136
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel":"Gender=Com|Number=Sing|PronType=Int,Rel",
137
- "POS=VERB|Tense=Pres":"Tense=Pres",
138
- "Case=Gen|Number=Plur|POS=DET|PronType=Ind":"Case=Gen|Number=Plur|PronType=Ind",
139
- "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=2|Poss=Yes|PronType=Prs",
140
- "POS=PRON|Person=2|Polite=Form|Poss=Yes|PronType=Prs":"Person=2|Polite=Form|Poss=Yes|PronType=Prs",
141
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
142
- "POS=AUX|Tense=Pres|VerbForm=Part":"Tense=Pres|VerbForm=Part",
143
- "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Pass",
144
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
145
- "Degree=Sup|Number=Plur|POS=ADJ":"Degree=Sup|Number=Plur",
146
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":"Case=Acc|Gender=Com|Number=Plur|Person=2|PronType=Prs",
147
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
148
- "Definite=Ind|Number=Plur|POS=NOUN":"Definite=Ind|Number=Plur",
149
- "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":"Case=Gen|Number=Plur|Tense=Past|VerbForm=Part",
150
- "Mood=Imp|POS=AUX":"Mood=Imp",
151
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
152
- "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs",
153
- "Definite=Def|Gender=Com|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":"Definite=Def|Gender=Com|Number=Sing|Tense=Past|VerbForm=Part",
154
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Number=Plur|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
 
 
155
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind":"Case=Gen|Gender=Com|Number=Sing|PronType=Ind",
 
 
156
  "Case=Gen|POS=NOUN":"Case=Gen",
157
- "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs",
158
- "POS=DET|PronType=Dem":"PronType=Dem",
159
- "Definite=Def|Number=Plur|POS=NOUN":"Definite=Def|Number=Plur"
160
  },
161
  "labels_pos":{
162
  "AdpType=Prep|POS=ADP":85,
@@ -173,148 +175,150 @@
173
  "Degree=Pos|Number=Plur|POS=ADJ":84,
174
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":92,
175
  "POS=PUNCT":97,
 
176
  "POS=CCONJ":89,
177
- "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ":84,
178
- "Degree=Cmp|POS=ADJ":84,
179
- "POS=PRON|PartType=Inf":95,
180
- "Gender=Com|Number=Sing|POS=DET|PronType=Ind":90,
181
- "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ":84,
182
- "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
183
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":92,
184
- "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":84,
185
- "Gender=Neut|Number=Sing|POS=DET|PronType=Dem":90,
 
186
  "Degree=Pos|POS=ADV":86,
187
- "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":100,
188
- "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":92,
189
- "POS=PRON|PronType=Dem":95,
190
- "NumType=Card|POS=NUM":93,
191
- "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":84,
192
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
193
- "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":84,
194
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
195
- "NumType=Ord|POS=ADJ":84,
196
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
197
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act":87,
198
- "POS=VERB|VerbForm=Inf|Voice=Act":100,
 
199
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act":100,
200
- "POS=NOUN":92,
201
- "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass":100,
202
  "POS=ADP|PartType=Inf":85,
 
 
203
  "Degree=Pos|POS=ADJ":84,
 
 
 
204
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN":92,
205
- "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs":90,
 
 
 
206
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN":92,
 
 
207
  "POS=AUX|VerbForm=Inf|Voice=Act":87,
208
- "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":84,
 
 
 
 
 
 
 
209
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem":90,
210
- "Number=Plur|POS=DET|PronType=Ind":90,
211
- "Gender=Com|Number=Sing|POS=PRON|PronType=Ind":95,
212
- "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes":95,
213
- "POS=PART|PartType=Inf":94,
 
 
 
 
 
 
 
 
 
214
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind":90,
215
- "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs":95,
216
- "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN":92,
217
- "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs":95,
 
 
 
 
 
 
218
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":95,
219
- "Case=Nom|Gender=Com|POS=PRON|PronType=Ind":95,
220
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind":95,
221
- "Mood=Imp|POS=VERB":100,
222
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
223
- "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part":87,
224
- "POS=X":101,
 
 
 
 
 
 
 
225
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":95,
 
 
 
 
 
 
 
 
 
 
226
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN":92,
227
- "POS=VERB|Tense=Pres|VerbForm=Part":100,
228
- "Number=Plur|POS=PRON|PronType=Int,Rel":95,
229
- "POS=VERB|VerbForm=Inf|Voice=Pass":100,
230
- "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":92,
231
- "Degree=Cmp|POS=ADV":86,
232
- "POS=ADV|PartType=Inf":86,
233
- "Degree=Sup|POS=ADV":86,
234
  "Number=Plur|POS=PRON|PronType=Dem":95,
235
- "Number=Plur|POS=PRON|PronType=Ind":95,
236
- "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":92,
237
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":95,
238
- "Case=Gen|POS=PROPN":96,
239
- "POS=ADP":85,
240
  "Degree=Cmp|Number=Plur|POS=ADJ":84,
241
- "Definite=Def|Degree=Sup|POS=ADJ":84,
242
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
243
- "Degree=Pos|Number=Sing|POS=ADJ":84,
244
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
245
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
246
  "Number=Plur|POS=PRON|PronType=Rcp":95,
 
247
  "Case=Gen|Degree=Cmp|POS=ADJ":84,
248
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":92,
249
- "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs":90,
250
- "POS=INTJ":91,
251
- "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
252
- "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":84,
253
- "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
254
- "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":95,
255
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
256
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":92,
257
- "Number=Sing|POS=PRON|PronType=Int,Rel":95,
258
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
259
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel":95,
260
- "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ":84,
261
- "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":95,
262
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
263
- "Definite=Ind|Number=Sing|POS=NOUN":92,
264
- "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":100,
265
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":95,
266
- "POS=SYM":99,
267
- "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":95,
268
- "Degree=Sup|POS=ADJ":84,
269
- "Number=Plur|POS=DET|PronType=Ind|Style=Arch":90,
270
- "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Dem":90,
271
- "Foreign=Yes|POS=X":101,
272
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs":90,
273
- "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem":95,
274
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":95,
275
- "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":92,
276
- "Case=Gen|POS=PRON|PronType=Int,Rel":95,
277
- "Gender=Com|Number=Sing|POS=PRON|PronType=Dem":95,
278
- "Abbr=Yes|POS=X":101,
279
- "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":92,
280
  "Definite=Def|Degree=Abs|POS=ADJ":84,
281
- "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ":84,
282
- "Definite=Ind|POS=NOUN":92,
283
- "Gender=Com|Number=Plur|POS=NOUN":92,
284
- "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
285
- "Gender=Com|POS=PRON|PronType=Int,Rel":95,
286
- "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":95,
287
  "Degree=Abs|POS=ADV":86,
288
- "POS=VERB|VerbForm=Ger":100,
289
- "POS=VERB|Tense=Past|VerbForm=Part":100,
290
- "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ":84,
291
- "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form":95,
292
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":84,
293
- "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ":84,
294
- "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":95,
295
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel":95,
296
- "POS=VERB|Tense=Pres":100,
297
- "Case=Gen|Number=Plur|POS=DET|PronType=Ind":90,
298
- "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
299
- "POS=PRON|Person=2|Polite=Form|Poss=Yes|PronType=Prs":95,
300
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
301
- "POS=AUX|Tense=Pres|VerbForm=Part":87,
302
- "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass":100,
303
- "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":95,
304
- "Degree=Sup|Number=Plur|POS=ADJ":84,
305
- "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":95,
306
- "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":95,
307
- "Definite=Ind|Number=Plur|POS=NOUN":92,
308
- "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":100,
309
- "Mood=Imp|POS=AUX":87,
310
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs":95,
311
- "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs":95,
312
- "Definite=Def|Gender=Com|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":100,
313
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
 
 
314
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind":90,
 
 
315
  "Case=Gen|POS=NOUN":92,
316
- "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs":95,
317
- "POS=DET|PronType=Dem":90,
318
- "Definite=Def|Number=Plur|POS=NOUN":92
319
- }
320
  }
 
1
  {
2
+ "extend":false,
3
  "labels_morph":{
4
  "AdpType=Prep|POS=ADP":"AdpType=Prep",
5
  "Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":"Definite=Ind|Gender=Com|Number=Sing",
 
15
  "Degree=Pos|Number=Plur|POS=ADJ":"Degree=Pos|Number=Plur",
16
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":"Definite=Ind|Gender=Com|Number=Plur",
17
  "POS=PUNCT":"",
18
+ "NumType=Ord|POS=ADJ":"NumType=Ord",
19
  "POS=CCONJ":"",
 
 
 
 
 
 
20
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":"Definite=Ind|Gender=Neut|Number=Plur",
21
+ "POS=VERB|VerbForm=Inf|Voice=Act":"VerbForm=Inf|Voice=Act",
22
+ "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs",
23
+ "Degree=Sup|POS=ADV":"Degree=Sup",
24
  "Degree=Pos|POS=ADV":"Degree=Pos",
25
+ "Gender=Com|Number=Sing|POS=DET|PronType=Ind":"Gender=Com|Number=Sing|PronType=Ind",
26
+ "Number=Plur|POS=DET|PronType=Ind":"Number=Plur|PronType=Ind",
27
+ "POS=ADP":"",
28
+ "POS=ADV|PartType=Inf":"PartType=Inf",
 
 
 
29
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=3|PronType=Prs",
 
 
30
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Act",
31
+ "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":"Definite=Def|Degree=Pos|Number=Sing",
32
+ "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs",
33
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Act",
 
 
34
  "POS=ADP|PartType=Inf":"PartType=Inf",
35
+ "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Gender=Com|Number=Sing",
36
+ "NumType=Card|POS=NUM":"NumType=Card",
37
  "Degree=Pos|POS=ADJ":"Degree=Pos",
38
+ "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part":"Definite=Ind|Number=Sing|Tense=Past|VerbForm=Part",
39
+ "POS=PART|PartType=Inf":"PartType=Inf",
40
+ "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes":"Case=Acc|Person=3|PronType=Prs|Reflex=Yes",
41
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN":"Definite=Def|Gender=Com|Number=Plur",
42
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":"Definite=Ind|Gender=Neut|Number=Sing",
43
+ "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs",
44
+ "POS=VERB|Tense=Pres|VerbForm=Part":"Tense=Pres|VerbForm=Part",
45
+ "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs":"Case=Nom|Number=Plur|Person=3|PronType=Prs",
46
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN":"Case=Gen|Definite=Def|Gender=Com|Number=Sing",
47
+ "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ":"Definite=Def|Degree=Sup|Number=Plur",
48
+ "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Number=Plur|Person=3|PronType=Prs",
49
  "POS=AUX|VerbForm=Inf|Voice=Act":"VerbForm=Inf|Voice=Act",
50
+ "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing",
51
+ "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Cmp|Number=Sing",
52
+ "Degree=Cmp|POS=ADJ":"Degree=Cmp",
53
+ "POS=PRON|PartType=Inf":"PartType=Inf",
54
+ "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Pos|Number=Sing",
55
+ "Case=Nom|Gender=Com|POS=PRON|PronType=Ind":"Case=Nom|Gender=Com|PronType=Ind",
56
+ "Number=Plur|POS=PRON|PronType=Ind":"Number=Plur|PronType=Ind",
57
+ "POS=INTJ":"",
58
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem":"Gender=Com|Number=Sing|PronType=Dem",
59
+ "Case=Gen|Number=Plur|POS=DET|PronType=Ind":"Case=Gen|Number=Plur|PronType=Ind",
60
+ "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass":"Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Pass",
61
+ "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":"Definite=Def|Gender=Neut|Number=Plur",
62
+ "Degree=Cmp|POS=ADV":"Degree=Cmp",
63
+ "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Number=Plur|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
64
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=3|PronType=Prs",
65
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Number=Plur|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
66
+ "Case=Gen|POS=PROPN":"Case=Gen",
67
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind":"Gender=Neut|Number=Sing|PronType=Ind",
68
+ "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":"Number=Plur|Tense=Past|VerbForm=Part",
69
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
70
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=1|PronType=Prs",
71
+ "Definite=Def|Degree=Sup|POS=ADJ":"Definite=Def|Degree=Sup",
72
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind":"Gender=Neut|Number=Sing|PronType=Ind",
73
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Neut|Number=Sing",
74
+ "Gender=Neut|Number=Sing|POS=DET|PronType=Dem":"Gender=Neut|Number=Sing|PronType=Dem",
75
+ "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":"Definite=Def|Number=Sing|Tense=Past|VerbForm=Part",
76
+ "POS=PRON|PronType=Dem":"PronType=Dem",
77
+ "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":"Degree=Pos|Gender=Com|Number=Sing",
78
+ "Number=Plur|POS=NUM":"Number=Plur",
79
+ "POS=VERB|VerbForm=Inf|Voice=Pass":"VerbForm=Inf|Voice=Pass",
80
+ "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ":"Definite=Def|Degree=Sup|Number=Sing",
81
+ "Number=Sing|POS=PRON|PronType=Int,Rel":"Number=Sing|PronType=Int,Rel",
82
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=1|PronType=Prs",
83
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
 
 
84
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
85
+ "POS=PRON":"",
86
+ "Definite=Ind|Number=Sing|POS=NOUN":"Definite=Ind|Number=Sing",
87
+ "Definite=Ind|Number=Sing|POS=NUM":"Definite=Ind|Number=Sing",
88
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Com|Number=Sing",
89
+ "Foreign=Yes|POS=ADV":"Foreign=Yes",
90
+ "POS=NOUN":"",
91
+ "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN":"Case=Gen|Definite=Def|Gender=Neut|Number=Sing",
92
+ "Gender=Com|Number=Plur|POS=NOUN":"Gender=Com|Number=Plur",
93
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel":"Gender=Neut|Number=Sing|PronType=Int,Rel",
94
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":"Case=Nom|Gender=Com|Number=Plur|Person=1|PronType=Prs",
95
+ "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs",
96
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Ind":"Gender=Com|Number=Sing|PronType=Ind",
97
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Com|Number=Plur",
98
+ "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":"Degree=Pos|Gender=Neut|Number=Sing",
99
+ "Degree=Sup|POS=ADJ":"Degree=Sup",
100
+ "Degree=Pos|Number=Sing|POS=ADJ":"Degree=Pos|Number=Sing",
101
+ "Mood=Imp|POS=VERB":"Mood=Imp",
102
+ "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":"Case=Nom|Gender=Com|Person=2|Polite=Form|PronType=Prs",
103
+ "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":"Case=Acc|Gender=Com|Person=2|Polite=Form|PronType=Prs",
104
+ "POS=X":"",
105
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN":"Case=Gen|Definite=Def|Gender=Com|Number=Plur",
 
 
 
 
 
 
 
106
  "Number=Plur|POS=PRON|PronType=Dem":"Number=Plur|PronType=Dem",
107
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":"Case=Acc|Gender=Com|Number=Plur|Person=1|PronType=Prs",
108
+ "Number=Plur|POS=PRON|PronType=Int,Rel":"Number=Plur|PronType=Int,Rel",
109
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
 
 
110
  "Degree=Cmp|Number=Plur|POS=ADJ":"Degree=Cmp|Number=Plur",
111
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":"Number=Plur|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
 
 
 
112
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Gender=Com|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
113
+ "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":"Case=Nom|Gender=Com|Number=Sing|Person=2|PronType=Prs",
114
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":"Case=Acc|Gender=Com|Number=Sing|Person=2|PronType=Prs",
115
+ "Gender=Com|POS=PRON|PronType=Int,Rel":"Gender=Com|PronType=Int,Rel",
116
+ "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ":"Case=Gen|Degree=Pos|Number=Plur",
117
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
118
+ "POS=VERB|VerbForm=Ger":"VerbForm=Ger",
119
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Dem":"Gender=Com|Number=Sing|PronType=Dem",
120
+ "Case=Gen|POS=PRON|PronType=Int,Rel":"Case=Gen|PronType=Int,Rel",
121
+ "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass":"Mood=Ind|Tense=Past|VerbForm=Fin|Voice=Pass",
122
+ "Abbr=Yes|POS=X":"Abbr=Yes",
123
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":"Case=Gen|Definite=Ind|Gender=Neut|Number=Plur",
124
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
125
+ "Definite=Ind|Number=Plur|POS=NOUN":"Definite=Ind|Number=Plur",
126
+ "Foreign=Yes|POS=X":"Foreign=Yes",
127
  "Number=Plur|POS=PRON|PronType=Rcp":"Number=Plur|PronType=Rcp",
128
+ "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":"Case=Nom|Gender=Com|Number=Plur|Person=2|PronType=Prs",
129
  "Case=Gen|Degree=Cmp|POS=ADJ":"Case=Gen|Degree=Cmp",
130
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":"Case=Gen|Definite=Def|Gender=Neut|Number=Plur",
131
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":"Case=Acc|Gender=Com|Number=Plur|Person=2|PronType=Prs",
132
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem":"Gender=Neut|Number=Sing|PronType=Dem",
 
 
 
 
 
 
 
133
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Number=Plur|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
134
+ "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":"Gender=Neut|Number=Sing|Number[psor]=Plur|Person=1|Poss=Yes|PronType=Prs|Style=Form",
 
 
 
 
 
135
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":"Number=Plur|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes",
136
+ "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs",
137
+ "Case=Gen|Number=Plur|POS=PRON|PronType=Rcp":"Case=Gen|Number=Plur|PronType=Rcp",
 
 
 
 
138
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs":"Person=2|Polite=Form|Poss=Yes|PronType=Prs",
139
+ "POS=SYM":"",
140
+ "POS=DET|PronType=Dem":"PronType=Dem",
141
+ "Gender=Com|Number=Sing|POS=NUM":"Gender=Com|Number=Sing",
142
+ "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=2|Poss=Yes|PronType=Prs",
143
+ "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":"Case=Gen|Number=Plur|Tense=Past|VerbForm=Part",
 
 
144
  "Definite=Def|Degree=Abs|POS=ADJ":"Definite=Def|Degree=Abs",
145
+ "POS=VERB|Tense=Pres":"Tense=Pres",
146
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NUM":"Definite=Ind|Gender=Neut|Number=Sing",
 
 
 
 
147
  "Degree=Abs|POS=ADV":"Degree=Abs",
 
 
 
 
148
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":"Case=Gen|Definite=Def|Degree=Pos|Number=Sing",
 
 
149
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel":"Gender=Com|Number=Sing|PronType=Int,Rel",
150
+ "POS=VERB|Tense=Past|VerbForm=Part":"Tense=Past|VerbForm=Part",
151
+ "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ":"Definite=Ind|Degree=Sup|Number=Sing",
 
 
152
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Gender=Neut|Number=Sing|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
 
 
 
 
 
 
 
 
 
153
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs":"Gender=Com|Number=Sing|Number[psor]=Sing|Person=1|Poss=Yes|PronType=Prs",
 
 
154
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":"Number=Plur|Number[psor]=Sing|Person=2|Poss=Yes|PronType=Prs",
155
+ "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs":"Number[psor]=Plur|Person=3|Poss=Yes|PronType=Prs",
156
+ "Definite=Ind|POS=NOUN":"Definite=Ind",
157
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind":"Case=Gen|Gender=Com|Number=Sing|PronType=Ind",
158
+ "Definite=Ind|Gender=Com|Number=Sing|POS=NUM":"Definite=Ind|Gender=Com|Number=Sing",
159
+ "Definite=Def|Number=Plur|POS=NOUN":"Definite=Def|Number=Plur",
160
  "Case=Gen|POS=NOUN":"Case=Gen",
161
+ "POS=AUX|Tense=Pres|VerbForm=Part":"Tense=Pres|VerbForm=Part"
 
 
162
  },
163
  "labels_pos":{
164
  "AdpType=Prep|POS=ADP":85,
 
175
  "Degree=Pos|Number=Plur|POS=ADJ":84,
176
  "Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":92,
177
  "POS=PUNCT":97,
178
+ "NumType=Ord|POS=ADJ":84,
179
  "POS=CCONJ":89,
 
 
 
 
 
 
180
  "Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":92,
181
+ "POS=VERB|VerbForm=Inf|Voice=Act":100,
182
+ "Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
183
+ "Degree=Sup|POS=ADV":86,
184
  "Degree=Pos|POS=ADV":86,
185
+ "Gender=Com|Number=Sing|POS=DET|PronType=Ind":90,
186
+ "Number=Plur|POS=DET|PronType=Ind":90,
187
+ "POS=ADP":85,
188
+ "POS=ADV|PartType=Inf":86,
 
 
 
189
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
 
 
190
  "Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act":87,
191
+ "Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":84,
192
+ "Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs":90,
193
  "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act":100,
 
 
194
  "POS=ADP|PartType=Inf":85,
195
+ "Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":84,
196
+ "NumType=Card|POS=NUM":93,
197
  "Degree=Pos|POS=ADJ":84,
198
+ "Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part":87,
199
+ "POS=PART|PartType=Inf":94,
200
+ "Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes":95,
201
  "Definite=Def|Gender=Com|Number=Plur|POS=NOUN":92,
202
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":92,
203
+ "Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs":90,
204
+ "POS=VERB|Tense=Pres|VerbForm=Part":100,
205
+ "Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs":95,
206
  "Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN":92,
207
+ "Definite=Def|Degree=Sup|Number=Plur|POS=ADJ":84,
208
+ "Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs":95,
209
  "POS=AUX|VerbForm=Inf|Voice=Act":87,
210
+ "Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":84,
211
+ "Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ":84,
212
+ "Degree=Cmp|POS=ADJ":84,
213
+ "POS=PRON|PartType=Inf":95,
214
+ "Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ":84,
215
+ "Case=Nom|Gender=Com|POS=PRON|PronType=Ind":95,
216
+ "Number=Plur|POS=PRON|PronType=Ind":95,
217
+ "POS=INTJ":91,
218
  "Gender=Com|Number=Sing|POS=DET|PronType=Dem":90,
219
+ "Case=Gen|Number=Plur|POS=DET|PronType=Ind":90,
220
+ "Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass":100,
221
+ "Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":92,
222
+ "Degree=Cmp|POS=ADV":86,
223
+ "Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form":95,
224
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs":95,
225
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
226
+ "Case=Gen|POS=PROPN":96,
227
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Ind":95,
228
+ "Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":100,
229
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
230
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":95,
231
+ "Definite=Def|Degree=Sup|POS=ADJ":84,
232
  "Gender=Neut|Number=Sing|POS=DET|PronType=Ind":90,
233
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN":92,
234
+ "Gender=Neut|Number=Sing|POS=DET|PronType=Dem":90,
235
+ "Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part":100,
236
+ "POS=PRON|PronType=Dem":95,
237
+ "Degree=Pos|Gender=Com|Number=Sing|POS=ADJ":84,
238
+ "Number=Plur|POS=NUM":93,
239
+ "POS=VERB|VerbForm=Inf|Voice=Pass":100,
240
+ "Definite=Def|Degree=Sup|Number=Sing|POS=ADJ":84,
241
+ "Number=Sing|POS=PRON|PronType=Int,Rel":95,
242
  "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs":95,
243
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
 
 
244
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
245
+ "POS=PRON":95,
246
+ "Definite=Ind|Number=Sing|POS=NOUN":92,
247
+ "Definite=Ind|Number=Sing|POS=NUM":93,
248
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN":92,
249
+ "Foreign=Yes|POS=ADV":86,
250
+ "POS=NOUN":92,
251
+ "Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN":92,
252
+ "Gender=Com|Number=Plur|POS=NOUN":92,
253
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel":95,
254
  "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":95,
255
+ "Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
256
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Ind":95,
257
+ "Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN":92,
258
+ "Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ":84,
259
+ "Degree=Sup|POS=ADJ":84,
260
+ "Degree=Pos|Number=Sing|POS=ADJ":84,
261
+ "Mood=Imp|POS=VERB":100,
262
+ "Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":95,
263
+ "Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs":95,
264
+ "POS=X":101,
265
  "Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN":92,
 
 
 
 
 
 
 
266
  "Number=Plur|POS=PRON|PronType=Dem":95,
267
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs":95,
268
+ "Number=Plur|POS=PRON|PronType=Int,Rel":95,
269
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":90,
 
 
270
  "Degree=Cmp|Number=Plur|POS=ADJ":84,
271
+ "Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs":90,
 
 
 
272
  "Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
273
+ "Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":95,
274
+ "Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs":95,
275
+ "Gender=Com|POS=PRON|PronType=Int,Rel":95,
276
+ "Case=Gen|Degree=Pos|Number=Plur|POS=ADJ":84,
277
+ "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":95,
278
+ "POS=VERB|VerbForm=Ger":100,
279
+ "Gender=Com|Number=Sing|POS=PRON|PronType=Dem":95,
280
+ "Case=Gen|POS=PRON|PronType=Int,Rel":95,
281
+ "Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass":100,
282
+ "Abbr=Yes|POS=X":101,
283
+ "Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN":92,
284
+ "Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
285
+ "Definite=Ind|Number=Plur|POS=NOUN":92,
286
+ "Foreign=Yes|POS=X":101,
287
  "Number=Plur|POS=PRON|PronType=Rcp":95,
288
+ "Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":95,
289
  "Case=Gen|Degree=Cmp|POS=ADJ":84,
290
  "Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN":92,
291
+ "Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs":95,
292
+ "Gender=Neut|Number=Sing|POS=PRON|PronType=Dem":95,
 
 
 
 
 
 
 
293
  "Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
294
+ "Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form":90,
 
 
 
 
 
295
  "Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes":95,
296
+ "Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs":95,
297
+ "Case=Gen|Number=Plur|POS=PRON|PronType=Rcp":95,
 
 
 
 
298
  "POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs":90,
299
+ "POS=SYM":99,
300
+ "POS=DET|PronType=Dem":90,
301
+ "Gender=Com|Number=Sing|POS=NUM":93,
302
+ "Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
303
+ "Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part":100,
 
 
304
  "Definite=Def|Degree=Abs|POS=ADJ":84,
305
+ "POS=VERB|Tense=Pres":100,
306
+ "Definite=Ind|Gender=Neut|Number=Sing|POS=NUM":93,
 
 
 
 
307
  "Degree=Abs|POS=ADV":86,
 
 
 
 
308
  "Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ":84,
 
 
309
  "Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel":95,
310
+ "POS=VERB|Tense=Past|VerbForm=Part":100,
311
+ "Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ":84,
 
 
312
  "Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
 
 
 
 
 
 
 
 
 
313
  "Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs":95,
 
 
314
  "Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs":90,
315
+ "Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs":95,
316
+ "Definite=Ind|POS=NOUN":92,
317
  "Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind":90,
318
+ "Definite=Ind|Gender=Com|Number=Sing|POS=NUM":93,
319
+ "Definite=Def|Number=Plur|POS=NOUN":92,
320
  "Case=Gen|POS=NOUN":92,
321
+ "POS=AUX|Tense=Pres|VerbForm=Part":87
322
+ },
323
+ "overwrite":true
 
324
  }
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3ff4ac19d2ea3ccfe10ad70e8cbcf15b22916b18b08d7d8efa3990be0323bf78
3
- size 483528
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b321cc35401b6a7fa432a60b6e203624085a924ce2664531e21f86f79a5a301
3
+ size 486656
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a3ecfb5d8bd519dd27a53f8f5eb525d46bd9fe2d60fd3173db955f0896317979
3
  size 225962
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70740eac699634c75cf5a480bedf1a47a8dc32dc2dfc500a80b91b94b3600d50
3
  size 225962
ner/moves CHANGED
@@ -1 +1 @@
1
- ��moves��{"0":{},"1":{"PER":2146,"MISC":1273,"ORG":1267,"LOC":1144},"2":{"PER":2146,"MISC":1273,"ORG":1267,"LOC":1144},"3":{"PER":2146,"MISC":1273,"ORG":1267,"LOC":1144},"4":{"PER":2146,"MISC":1273,"ORG":1267,"LOC":1144,"":1},"5":{"":1}}�cfg��neg_key�
 
1
+ ��moves��{"0":{},"1":{"PER":1361,"ORG":943,"MISC":826,"LOC":768},"2":{"PER":1361,"ORG":943,"MISC":826,"LOC":768},"3":{"PER":1361,"ORG":943,"MISC":826,"LOC":768},"4":{"PER":1361,"ORG":943,"MISC":826,"LOC":768,"":1},"5":{"":1}}�cfg��neg_key�
parser/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:91770681cc7c9adf240c0d7dda5b09a3a62ccf798e939366da0c351d0f080680
3
- size 456157
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52eafb860a09884bf0ab0357223f105e97cfc1f6932ee3214d1a67b4a807d331
3
+ size 1088787
parser/moves CHANGED
@@ -1 +1 @@
1
- ��moves�2{"0":{"":41514},"1":{"":34292},"2":{"case":7489,"nsubj":6009,"det":4334,"amod":3968,"advmod":3657,"mark":3529,"aux":2432,"cc":2261,"punct":2182,"cop":1329,"obl":894,"nummod":799,"nmod:poss":651,"nmod":460,"expl":291,"ccomp":202,"obj":195,"xcomp":122,"case||nmod":73,"obl:tmod":53,"dep":49,"acl:relcl":43},"3":{"punct":8600,"obl":3949,"obj":3758,"nmod":3565,"conj":2743,"advmod":2095,"flat":1294,"nsubj":1172,"acl:relcl":1131,"advcl":808,"amod":629,"obl:loc":467,"fixed":390,"dep":322,"xcomp":272,"appos":268,"compound:prt":261,"ccomp":252,"acl:relcl||nsubj":237,"case":202,"nummod":167,"list":161,"nmod:poss":156,"punct||conj":151,"mark":137,"cc":135,"iobj":107,"expl":77,"cop":69,"nmod||case":60,"aux":48,"obl:tmod":45,"cc||case":43,"advcl||advmod":43,"cc||conj":40,"case||obl":38,"punct||case":33},"4":{"ROOT":4367}}�cfg��neg_key�
 
1
+ ��moves��{"0":{"":30710},"1":{"":22084},"2":{"case":5238,"nsubj":4163,"punct":3257,"det":3028,"amod":2815,"advmod":2482,"mark":2317,"aux":1748,"cc":1610,"cop":823,"obl":627,"nummod":620,"nmod:poss":457,"nmod":384,"expl":193,"obj":188,"ccomp":155,"advcl":110,"xcomp":81,"case||nmod":45,"dep":32,"obl:tmod":31},"3":{"punct":4355,"obl":2759,"obj":2659,"nmod":2503,"conj":1923,"advmod":1246,"flat":886,"nsubj":805,"acl:relcl":800,"advcl":744,"amod":415,"xcomp":307,"advmod:lmod":273,"fixed":267,"dep":218,"compound:prt":211,"appos":187,"ccomp":177,"acl:relcl||nsubj":144,"case":130,"nmod:poss":112,"mark":103,"iobj":99,"nummod":93,"list":86,"cc":72,"expl":55,"cop":40,"obl:lmod":35,"obl:tmod":34,"cc||case":31},"4":{"ROOT":2970}}�cfg��neg_key�
span_resolver/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "nI":768
3
+ }
span_resolver/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f92b305454b1a26c0d831c6470c36decb6c44ade1d676aa9733ff99865f3926b
3
+ size 7712965
tagger/cfg ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ "ADJ",
4
+ "ADP",
5
+ "ADV",
6
+ "AUX",
7
+ "CCONJ",
8
+ "DET",
9
+ "INTJ",
10
+ "NOUN",
11
+ "NUM",
12
+ "PART",
13
+ "PRON",
14
+ "PROPN",
15
+ "PUNCT",
16
+ "SCONJ",
17
+ "SYM",
18
+ "VERB",
19
+ "X"
20
+ ],
21
+ "neg_prefix":"!",
22
+ "overwrite":false
23
+ }
tagger/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe76ce222162041e521865bb1fc6a9fc0a5d63f1882bb281bc68c66c10c39086
3
+ size 52932
tokenizer CHANGED
The diff for this file is too large to render. See raw diff
 
trainable_lemmatizer/cfg ADDED
@@ -0,0 +1,348 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ 1,
4
+ 2,
5
+ 4,
6
+ 6,
7
+ 8,
8
+ 10,
9
+ 12,
10
+ 14,
11
+ 16,
12
+ 18,
13
+ 20,
14
+ 24,
15
+ 26,
16
+ 29,
17
+ 30,
18
+ 34,
19
+ 36,
20
+ 38,
21
+ 42,
22
+ 44,
23
+ 46,
24
+ 48,
25
+ 50,
26
+ 52,
27
+ 54,
28
+ 56,
29
+ 57,
30
+ 60,
31
+ 63,
32
+ 65,
33
+ 67,
34
+ 69,
35
+ 71,
36
+ 73,
37
+ 75,
38
+ 76,
39
+ 78,
40
+ 81,
41
+ 83,
42
+ 84,
43
+ 86,
44
+ 88,
45
+ 92,
46
+ 96,
47
+ 98,
48
+ 100,
49
+ 103,
50
+ 106,
51
+ 108,
52
+ 110,
53
+ 113,
54
+ 115,
55
+ 117,
56
+ 119,
57
+ 121,
58
+ 124,
59
+ 125,
60
+ 127,
61
+ 129,
62
+ 131,
63
+ 133,
64
+ 134,
65
+ 138,
66
+ 140,
67
+ 142,
68
+ 144,
69
+ 146,
70
+ 148,
71
+ 151,
72
+ 153,
73
+ 155,
74
+ 156,
75
+ 159,
76
+ 160,
77
+ 162,
78
+ 164,
79
+ 166,
80
+ 167,
81
+ 168,
82
+ 170,
83
+ 172,
84
+ 175,
85
+ 177,
86
+ 180,
87
+ 182,
88
+ 185,
89
+ 188,
90
+ 190,
91
+ 191,
92
+ 194,
93
+ 197,
94
+ 199,
95
+ 201,
96
+ 205,
97
+ 208,
98
+ 211,
99
+ 212,
100
+ 213,
101
+ 215,
102
+ 217,
103
+ 219,
104
+ 220,
105
+ 221,
106
+ 223,
107
+ 224,
108
+ 226,
109
+ 229,
110
+ 231,
111
+ 232,
112
+ 233,
113
+ 236,
114
+ 238,
115
+ 240,
116
+ 242,
117
+ 244,
118
+ 246,
119
+ 249,
120
+ 250,
121
+ 252,
122
+ 255,
123
+ 256,
124
+ 257,
125
+ 228,
126
+ 259,
127
+ 262,
128
+ 264,
129
+ 266,
130
+ 269,
131
+ 271,
132
+ 274,
133
+ 276,
134
+ 279,
135
+ 281,
136
+ 283,
137
+ 284,
138
+ 285,
139
+ 286,
140
+ 288,
141
+ 289,
142
+ 290,
143
+ 291,
144
+ 293,
145
+ 294,
146
+ 297,
147
+ 298,
148
+ 300,
149
+ 302,
150
+ 303,
151
+ 305,
152
+ 307,
153
+ 308,
154
+ 309,
155
+ 311,
156
+ 312,
157
+ 314,
158
+ 316,
159
+ 318,
160
+ 321,
161
+ 322,
162
+ 323,
163
+ 324,
164
+ 325,
165
+ 327,
166
+ 329,
167
+ 331,
168
+ 333,
169
+ 334,
170
+ 336,
171
+ 338,
172
+ 340,
173
+ 341,
174
+ 343,
175
+ 345,
176
+ 348,
177
+ 351,
178
+ 353,
179
+ 355,
180
+ 356,
181
+ 357,
182
+ 360,
183
+ 362,
184
+ 366,
185
+ 368,
186
+ 370,
187
+ 372,
188
+ 374,
189
+ 376,
190
+ 377,
191
+ 379,
192
+ 381,
193
+ 382,
194
+ 383,
195
+ 384,
196
+ 386,
197
+ 388,
198
+ 389,
199
+ 391,
200
+ 392,
201
+ 395,
202
+ 396,
203
+ 398,
204
+ 400,
205
+ 401,
206
+ 402,
207
+ 403,
208
+ 405,
209
+ 407,
210
+ 408,
211
+ 409,
212
+ 410,
213
+ 412,
214
+ 414,
215
+ 415,
216
+ 418,
217
+ 419,
218
+ 421,
219
+ 422,
220
+ 425,
221
+ 427,
222
+ 428,
223
+ 430,
224
+ 432,
225
+ 433,
226
+ 435,
227
+ 436,
228
+ 438,
229
+ 440,
230
+ 442,
231
+ 443,
232
+ 444,
233
+ 445,
234
+ 446,
235
+ 448,
236
+ 451,
237
+ 452,
238
+ 453,
239
+ 455,
240
+ 456,
241
+ 458,
242
+ 461,
243
+ 463,
244
+ 464,
245
+ 466,
246
+ 467,
247
+ 468,
248
+ 469,
249
+ 470,
250
+ 473,
251
+ 475,
252
+ 479,
253
+ 481,
254
+ 482,
255
+ 486,
256
+ 488,
257
+ 490,
258
+ 492,
259
+ 493,
260
+ 495,
261
+ 497,
262
+ 502,
263
+ 503,
264
+ 504,
265
+ 505,
266
+ 508,
267
+ 509,
268
+ 510,
269
+ 511,
270
+ 512,
271
+ 513,
272
+ 514,
273
+ 515,
274
+ 516,
275
+ 518,
276
+ 519,
277
+ 521,
278
+ 522,
279
+ 524,
280
+ 528,
281
+ 529,
282
+ 530,
283
+ 534,
284
+ 535,
285
+ 537,
286
+ 539,
287
+ 542,
288
+ 544,
289
+ 547,
290
+ 549,
291
+ 550,
292
+ 551,
293
+ 553,
294
+ 555,
295
+ 556,
296
+ 557,
297
+ 559,
298
+ 560,
299
+ 562,
300
+ 566,
301
+ 567,
302
+ 568,
303
+ 570,
304
+ 573,
305
+ 576,
306
+ 579,
307
+ 581,
308
+ 583,
309
+ 584,
310
+ 585,
311
+ 587,
312
+ 590,
313
+ 592,
314
+ 595,
315
+ 597,
316
+ 598,
317
+ 599,
318
+ 600,
319
+ 604,
320
+ 606,
321
+ 608,
322
+ 609,
323
+ 611,
324
+ 612,
325
+ 613,
326
+ 614,
327
+ 616,
328
+ 618,
329
+ 619,
330
+ 622,
331
+ 623,
332
+ 624,
333
+ 625,
334
+ 626,
335
+ 627,
336
+ 628,
337
+ 632,
338
+ 635,
339
+ 636,
340
+ 637,
341
+ 639,
342
+ 640,
343
+ 641,
344
+ 642,
345
+ 643,
346
+ 644
347
+ ]
348
+ }
trainable_lemmatizer/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba995f69f75ff28dd830b01bda46a9873157e149e8c07c7478b21a36067409f5
3
+ size 1058797
trainable_lemmatizer/trees ADDED
Binary file (68.7 kB). View file
 
transformer/model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06442d138b0b97ccd2f16254dabcc1357524d0e6be92dc083a7207194fcf81a9
3
+ size 502755282
transformer/model/config.json DELETED
@@ -1,30 +0,0 @@
1
- {
2
- "_name_or_path": "Maltehb/danish-bert-botxo",
3
- "architectures": [
4
- "BertForPreTraining"
5
- ],
6
- "attention_probs_dropout_prob": 0.1,
7
- "directionality": "bidi",
8
- "gradient_checkpointing": false,
9
- "hidden_act": "gelu",
10
- "hidden_dropout_prob": 0.1,
11
- "hidden_size": 768,
12
- "initializer_range": 0.02,
13
- "intermediate_size": 3072,
14
- "layer_norm_eps": 1e-12,
15
- "max_position_embeddings": 512,
16
- "model_type": "bert",
17
- "num_attention_heads": 12,
18
- "num_hidden_layers": 12,
19
- "pad_token_id": 0,
20
- "pooler_fc_size": 768,
21
- "pooler_num_attention_heads": 12,
22
- "pooler_num_fc_layers": 3,
23
- "pooler_size_per_head": 128,
24
- "pooler_type": "first_token_transform",
25
- "position_embedding_type": "absolute",
26
- "transformers_version": "4.5.1",
27
- "type_vocab_size": 2,
28
- "use_cache": true,
29
- "vocab_size": 32000
30
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
transformer/model/special_tokens_map.json DELETED
@@ -1 +0,0 @@
1
- {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}
 
 
transformer/model/tokenizer_config.json DELETED
@@ -1 +0,0 @@
1
- {"do_lower_case": true, "unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]", "tokenize_chinese_chars": true, "strip_accents": false, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "Maltehb/danish-bert-botxo", "do_basic_tokenize": true, "never_split": null}
 
 
transformer/model/vocab.txt DELETED
The diff for this file is too large to render. See raw diff
 
vocab/lookups.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a6f4a94131759bf84baec98b3347bcef57ffb2d6712f7f3b8f611e9ef4b3df35
3
- size 20402
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76be8b528d0075f7aae98d6fa57a6d3c83ae480a8469e668d7b0af968995ac71
3
+ size 1
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5b50a86603f748496e4fd87a8aaa203a32bf82d4b3768bf54187ff40de3ca6f9
3
- size 460120
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:732a4ef43858427e5728dbfe761a5a082708791b280e19b62a949dc86d94d43b
3
+ size 559069
vocab/vectors.cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "mode":"default"
3
+ }