alanakbik commited on
Commit
db56834
1 Parent(s): 1c4c9e9

initial commit

Browse files
Files changed (4) hide show
  1. README.md +158 -0
  2. loss.tsv +21 -0
  3. pytorch_model.bin +3 -0
  4. training.log +892 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - flair
4
+ - token-classification
5
+ - sequence-tagger-model
6
+ language: de
7
+ datasets:
8
+ - conll2003
9
+ inference: false
10
+ ---
11
+
12
+ ## German NER in Flair (large model)
13
+
14
+ This is the large 4-class NER model for German that ships with [Flair](https://github.com/flairNLP/flair/).
15
+
16
+ F1-Score: **92,31** (CoNLL-03 German revised)
17
+
18
+ **! This model only works with Flair version 0.8 (will be released in the next few days) !**
19
+
20
+ Predicts 4 tags:
21
+
22
+ | **tag** | **meaning** |
23
+ |---------------------------------|-----------|
24
+ | PER | person name |
25
+ | LOC | location name |
26
+ | ORG | organization name |
27
+ | MISC | other name |
28
+
29
+ Based on [document-level XLM-R embeddings](https://www.aclweb.org/anthology/C18-1139/).
30
+
31
+ ---
32
+
33
+ ### Demo: How to use in Flair
34
+
35
+ Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
36
+
37
+ ```python
38
+ from flair.data import Sentence
39
+ from flair.models import SequenceTagger
40
+
41
+ # load tagger
42
+ tagger = SequenceTagger.load("flair/ner-german-large")
43
+
44
+ # make example sentence
45
+ sentence = Sentence("George Washington ging nach Washington")
46
+
47
+ # predict NER tags
48
+ tagger.predict(sentence)
49
+
50
+ # print sentence
51
+ print(sentence)
52
+
53
+ # print predicted NER spans
54
+ print('The following NER tags are found:')
55
+ # iterate over entities and print
56
+ for entity in sentence.get_spans('ner'):
57
+ print(entity)
58
+
59
+ ```
60
+
61
+ This yields the following output:
62
+ ```
63
+ Span [1,2]: "George Washington" [− Labels: PER (1.0)]
64
+ Span [5]: "Washington" [− Labels: LOC (1.0)]
65
+ ```
66
+
67
+ So, the entities "*George Washington*" (labeled as a **person**) and "*Washington*" (labeled as a **location**) are found in the sentence "*George Washington ging nach Washington*".
68
+
69
+
70
+ ---
71
+
72
+ ### Training: Script to train this model
73
+
74
+ The following Flair script was used to train this model:
75
+
76
+ ```python
77
+ import torch
78
+
79
+ # 1. get the corpus
80
+ from flair.datasets import CONLL_03_GERMAN
81
+
82
+ corpus = CONLL_03_GERMAN()
83
+
84
+ # 2. what tag do we want to predict?
85
+ tag_type = 'ner'
86
+
87
+ # 3. make the tag dictionary from the corpus
88
+ tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
89
+
90
+ # 4. initialize fine-tuneable transformer embeddings WITH document context
91
+ from flair.embeddings import TransformerWordEmbeddings
92
+
93
+ embeddings = TransformerWordEmbeddings(
94
+ model='xlm-roberta-large',
95
+ layers="-1",
96
+ subtoken_pooling="first",
97
+ fine_tune=True,
98
+ use_context=True,
99
+ )
100
+
101
+ # 5. initialize bare-bones sequence tagger (no CRF, no RNN, no reprojection)
102
+ from flair.models import SequenceTagger
103
+
104
+ tagger = SequenceTagger(
105
+ hidden_size=256,
106
+ embeddings=embeddings,
107
+ tag_dictionary=tag_dictionary,
108
+ tag_type='ner',
109
+ use_crf=False,
110
+ use_rnn=False,
111
+ reproject_embeddings=False,
112
+ )
113
+
114
+ # 6. initialize trainer with AdamW optimizer
115
+ from flair.trainers import ModelTrainer
116
+
117
+ trainer = ModelTrainer(tagger, corpus, optimizer=torch.optim.AdamW)
118
+
119
+ # 7. run training with XLM parameters (20 epochs, small LR)
120
+ from torch.optim.lr_scheduler import OneCycleLR
121
+
122
+ trainer.train('resources/taggers/ner-german-large',
123
+ learning_rate=5.0e-6,
124
+ mini_batch_size=4,
125
+ mini_batch_chunk_size=1,
126
+ max_epochs=20,
127
+ scheduler=OneCycleLR,
128
+ embeddings_storage_mode='none',
129
+ weight_decay=0.,
130
+ )
131
+
132
+ )
133
+ ```
134
+
135
+
136
+
137
+ ---
138
+
139
+ ### Cite
140
+
141
+ Please cite the following paper when using this model.
142
+
143
+ ```
144
+ @misc{schweter2020flert,
145
+ title={FLERT: Document-Level Features for Named Entity Recognition},
146
+ author={Stefan Schweter and Alan Akbik},
147
+ year={2020},
148
+ eprint={2011.06993},
149
+ archivePrefix={arXiv},
150
+ primaryClass={cs.CL}
151
+ }
152
+ ```
153
+
154
+ ---
155
+
156
+ ### Issues?
157
+
158
+ The Flair issue tracker is available [here](https://github.com/flairNLP/flair/issues/).
loss.tsv ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS
2
+ 1 23:09:11 4 0.0000 0.32601759576456596
3
+ 2 23:47:24 4 0.0000 0.2290581286322424
4
+ 3 00:24:29 4 0.0000 0.18555273314403667
5
+ 4 01:01:23 4 0.0000 0.1656336001230214
6
+ 5 01:38:16 4 0.0000 0.1648284967723802
7
+ 6 02:15:11 4 0.0000 0.16483939256504943
8
+ 7 02:52:04 4 0.0000 0.16203806226872322
9
+ 8 03:30:04 4 0.0000 0.1390128146978733
10
+ 9 04:06:55 4 0.0000 0.1558572274514281
11
+ 10 04:46:02 4 0.0000 0.1625431115291299
12
+ 11 05:24:31 4 0.0000 0.14667205465203892
13
+ 12 06:01:33 4 0.0000 0.14475093385013862
14
+ 13 06:39:47 4 0.0000 0.15118245752181225
15
+ 14 07:17:44 4 0.0000 0.14665753430476344
16
+ 15 07:55:53 4 0.0000 0.14730402247343105
17
+ 16 08:35:02 4 0.0000 0.14555113955140297
18
+ 17 09:14:10 4 0.0000 0.14034509936848258
19
+ 18 09:46:00 4 0.0000 0.14482688813742225
20
+ 19 10:18:27 4 0.0000 0.1385989190499177
21
+ 20 10:50:38 4 0.0000 0.13479246194568445
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69644e87635b92a84d0f23f67c0fce11eac39a3c9a0dae107e7e3e0d6ef20edd
3
+ size 2239866697
training.log ADDED
@@ -0,0 +1,892 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2021-01-20 22:30:34,817 ----------------------------------------------------------------------------------------------------
2
+ 2021-01-20 22:30:34,820 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): XLMRobertaModel(
5
+ (embeddings): RobertaEmbeddings(
6
+ (word_embeddings): Embedding(250002, 1024, padding_idx=1)
7
+ (position_embeddings): Embedding(514, 1024, padding_idx=1)
8
+ (token_type_embeddings): Embedding(1, 1024)
9
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): RobertaEncoder(
13
+ (layer): ModuleList(
14
+ (0): RobertaLayer(
15
+ (attention): RobertaAttention(
16
+ (self): RobertaSelfAttention(
17
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
18
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
19
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): RobertaSelfOutput(
23
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
24
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): RobertaIntermediate(
29
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
30
+ )
31
+ (output): RobertaOutput(
32
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
33
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
34
+ (dropout): Dropout(p=0.1, inplace=False)
35
+ )
36
+ )
37
+ (1): RobertaLayer(
38
+ (attention): RobertaAttention(
39
+ (self): RobertaSelfAttention(
40
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
41
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
42
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
43
+ (dropout): Dropout(p=0.1, inplace=False)
44
+ )
45
+ (output): RobertaSelfOutput(
46
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
47
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
48
+ (dropout): Dropout(p=0.1, inplace=False)
49
+ )
50
+ )
51
+ (intermediate): RobertaIntermediate(
52
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
53
+ )
54
+ (output): RobertaOutput(
55
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
56
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
57
+ (dropout): Dropout(p=0.1, inplace=False)
58
+ )
59
+ )
60
+ (2): RobertaLayer(
61
+ (attention): RobertaAttention(
62
+ (self): RobertaSelfAttention(
63
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
64
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
65
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
66
+ (dropout): Dropout(p=0.1, inplace=False)
67
+ )
68
+ (output): RobertaSelfOutput(
69
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
70
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
71
+ (dropout): Dropout(p=0.1, inplace=False)
72
+ )
73
+ )
74
+ (intermediate): RobertaIntermediate(
75
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
76
+ )
77
+ (output): RobertaOutput(
78
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
79
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
80
+ (dropout): Dropout(p=0.1, inplace=False)
81
+ )
82
+ )
83
+ (3): RobertaLayer(
84
+ (attention): RobertaAttention(
85
+ (self): RobertaSelfAttention(
86
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
87
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
88
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
89
+ (dropout): Dropout(p=0.1, inplace=False)
90
+ )
91
+ (output): RobertaSelfOutput(
92
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
93
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
94
+ (dropout): Dropout(p=0.1, inplace=False)
95
+ )
96
+ )
97
+ (intermediate): RobertaIntermediate(
98
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
99
+ )
100
+ (output): RobertaOutput(
101
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
102
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
103
+ (dropout): Dropout(p=0.1, inplace=False)
104
+ )
105
+ )
106
+ (4): RobertaLayer(
107
+ (attention): RobertaAttention(
108
+ (self): RobertaSelfAttention(
109
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
110
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
111
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
112
+ (dropout): Dropout(p=0.1, inplace=False)
113
+ )
114
+ (output): RobertaSelfOutput(
115
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
116
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
117
+ (dropout): Dropout(p=0.1, inplace=False)
118
+ )
119
+ )
120
+ (intermediate): RobertaIntermediate(
121
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
122
+ )
123
+ (output): RobertaOutput(
124
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
125
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
126
+ (dropout): Dropout(p=0.1, inplace=False)
127
+ )
128
+ )
129
+ (5): RobertaLayer(
130
+ (attention): RobertaAttention(
131
+ (self): RobertaSelfAttention(
132
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
133
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
134
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
135
+ (dropout): Dropout(p=0.1, inplace=False)
136
+ )
137
+ (output): RobertaSelfOutput(
138
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
139
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ )
143
+ (intermediate): RobertaIntermediate(
144
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
145
+ )
146
+ (output): RobertaOutput(
147
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
148
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
149
+ (dropout): Dropout(p=0.1, inplace=False)
150
+ )
151
+ )
152
+ (6): RobertaLayer(
153
+ (attention): RobertaAttention(
154
+ (self): RobertaSelfAttention(
155
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
156
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
157
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
158
+ (dropout): Dropout(p=0.1, inplace=False)
159
+ )
160
+ (output): RobertaSelfOutput(
161
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
162
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
163
+ (dropout): Dropout(p=0.1, inplace=False)
164
+ )
165
+ )
166
+ (intermediate): RobertaIntermediate(
167
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
168
+ )
169
+ (output): RobertaOutput(
170
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
171
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
172
+ (dropout): Dropout(p=0.1, inplace=False)
173
+ )
174
+ )
175
+ (7): RobertaLayer(
176
+ (attention): RobertaAttention(
177
+ (self): RobertaSelfAttention(
178
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
179
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
180
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
181
+ (dropout): Dropout(p=0.1, inplace=False)
182
+ )
183
+ (output): RobertaSelfOutput(
184
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
185
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
186
+ (dropout): Dropout(p=0.1, inplace=False)
187
+ )
188
+ )
189
+ (intermediate): RobertaIntermediate(
190
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
191
+ )
192
+ (output): RobertaOutput(
193
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
194
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
195
+ (dropout): Dropout(p=0.1, inplace=False)
196
+ )
197
+ )
198
+ (8): RobertaLayer(
199
+ (attention): RobertaAttention(
200
+ (self): RobertaSelfAttention(
201
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
202
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
203
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
204
+ (dropout): Dropout(p=0.1, inplace=False)
205
+ )
206
+ (output): RobertaSelfOutput(
207
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
208
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
209
+ (dropout): Dropout(p=0.1, inplace=False)
210
+ )
211
+ )
212
+ (intermediate): RobertaIntermediate(
213
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
214
+ )
215
+ (output): RobertaOutput(
216
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
217
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
218
+ (dropout): Dropout(p=0.1, inplace=False)
219
+ )
220
+ )
221
+ (9): RobertaLayer(
222
+ (attention): RobertaAttention(
223
+ (self): RobertaSelfAttention(
224
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
225
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
226
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ (output): RobertaSelfOutput(
230
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
231
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
232
+ (dropout): Dropout(p=0.1, inplace=False)
233
+ )
234
+ )
235
+ (intermediate): RobertaIntermediate(
236
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
237
+ )
238
+ (output): RobertaOutput(
239
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
240
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (10): RobertaLayer(
245
+ (attention): RobertaAttention(
246
+ (self): RobertaSelfAttention(
247
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
248
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
249
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
250
+ (dropout): Dropout(p=0.1, inplace=False)
251
+ )
252
+ (output): RobertaSelfOutput(
253
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
254
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
255
+ (dropout): Dropout(p=0.1, inplace=False)
256
+ )
257
+ )
258
+ (intermediate): RobertaIntermediate(
259
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
260
+ )
261
+ (output): RobertaOutput(
262
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
263
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
264
+ (dropout): Dropout(p=0.1, inplace=False)
265
+ )
266
+ )
267
+ (11): RobertaLayer(
268
+ (attention): RobertaAttention(
269
+ (self): RobertaSelfAttention(
270
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
271
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
272
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
273
+ (dropout): Dropout(p=0.1, inplace=False)
274
+ )
275
+ (output): RobertaSelfOutput(
276
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
277
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
278
+ (dropout): Dropout(p=0.1, inplace=False)
279
+ )
280
+ )
281
+ (intermediate): RobertaIntermediate(
282
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
283
+ )
284
+ (output): RobertaOutput(
285
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
286
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
287
+ (dropout): Dropout(p=0.1, inplace=False)
288
+ )
289
+ )
290
+ (12): RobertaLayer(
291
+ (attention): RobertaAttention(
292
+ (self): RobertaSelfAttention(
293
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
294
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
295
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
296
+ (dropout): Dropout(p=0.1, inplace=False)
297
+ )
298
+ (output): RobertaSelfOutput(
299
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
300
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
301
+ (dropout): Dropout(p=0.1, inplace=False)
302
+ )
303
+ )
304
+ (intermediate): RobertaIntermediate(
305
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
306
+ )
307
+ (output): RobertaOutput(
308
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
309
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
310
+ (dropout): Dropout(p=0.1, inplace=False)
311
+ )
312
+ )
313
+ (13): RobertaLayer(
314
+ (attention): RobertaAttention(
315
+ (self): RobertaSelfAttention(
316
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
317
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
318
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
319
+ (dropout): Dropout(p=0.1, inplace=False)
320
+ )
321
+ (output): RobertaSelfOutput(
322
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
323
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
324
+ (dropout): Dropout(p=0.1, inplace=False)
325
+ )
326
+ )
327
+ (intermediate): RobertaIntermediate(
328
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
329
+ )
330
+ (output): RobertaOutput(
331
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
332
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
333
+ (dropout): Dropout(p=0.1, inplace=False)
334
+ )
335
+ )
336
+ (14): RobertaLayer(
337
+ (attention): RobertaAttention(
338
+ (self): RobertaSelfAttention(
339
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
340
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
341
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
342
+ (dropout): Dropout(p=0.1, inplace=False)
343
+ )
344
+ (output): RobertaSelfOutput(
345
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
346
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
347
+ (dropout): Dropout(p=0.1, inplace=False)
348
+ )
349
+ )
350
+ (intermediate): RobertaIntermediate(
351
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
352
+ )
353
+ (output): RobertaOutput(
354
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
355
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
356
+ (dropout): Dropout(p=0.1, inplace=False)
357
+ )
358
+ )
359
+ (15): RobertaLayer(
360
+ (attention): RobertaAttention(
361
+ (self): RobertaSelfAttention(
362
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
363
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
364
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
365
+ (dropout): Dropout(p=0.1, inplace=False)
366
+ )
367
+ (output): RobertaSelfOutput(
368
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
369
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
370
+ (dropout): Dropout(p=0.1, inplace=False)
371
+ )
372
+ )
373
+ (intermediate): RobertaIntermediate(
374
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
375
+ )
376
+ (output): RobertaOutput(
377
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
378
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
379
+ (dropout): Dropout(p=0.1, inplace=False)
380
+ )
381
+ )
382
+ (16): RobertaLayer(
383
+ (attention): RobertaAttention(
384
+ (self): RobertaSelfAttention(
385
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
386
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
387
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
388
+ (dropout): Dropout(p=0.1, inplace=False)
389
+ )
390
+ (output): RobertaSelfOutput(
391
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
392
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
393
+ (dropout): Dropout(p=0.1, inplace=False)
394
+ )
395
+ )
396
+ (intermediate): RobertaIntermediate(
397
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
398
+ )
399
+ (output): RobertaOutput(
400
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
401
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
402
+ (dropout): Dropout(p=0.1, inplace=False)
403
+ )
404
+ )
405
+ (17): RobertaLayer(
406
+ (attention): RobertaAttention(
407
+ (self): RobertaSelfAttention(
408
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
409
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
410
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
411
+ (dropout): Dropout(p=0.1, inplace=False)
412
+ )
413
+ (output): RobertaSelfOutput(
414
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
415
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
416
+ (dropout): Dropout(p=0.1, inplace=False)
417
+ )
418
+ )
419
+ (intermediate): RobertaIntermediate(
420
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
421
+ )
422
+ (output): RobertaOutput(
423
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
424
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
425
+ (dropout): Dropout(p=0.1, inplace=False)
426
+ )
427
+ )
428
+ (18): RobertaLayer(
429
+ (attention): RobertaAttention(
430
+ (self): RobertaSelfAttention(
431
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
432
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
433
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
434
+ (dropout): Dropout(p=0.1, inplace=False)
435
+ )
436
+ (output): RobertaSelfOutput(
437
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
438
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
439
+ (dropout): Dropout(p=0.1, inplace=False)
440
+ )
441
+ )
442
+ (intermediate): RobertaIntermediate(
443
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
444
+ )
445
+ (output): RobertaOutput(
446
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
447
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
448
+ (dropout): Dropout(p=0.1, inplace=False)
449
+ )
450
+ )
451
+ (19): RobertaLayer(
452
+ (attention): RobertaAttention(
453
+ (self): RobertaSelfAttention(
454
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
455
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
456
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
457
+ (dropout): Dropout(p=0.1, inplace=False)
458
+ )
459
+ (output): RobertaSelfOutput(
460
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
461
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
462
+ (dropout): Dropout(p=0.1, inplace=False)
463
+ )
464
+ )
465
+ (intermediate): RobertaIntermediate(
466
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
467
+ )
468
+ (output): RobertaOutput(
469
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
470
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
471
+ (dropout): Dropout(p=0.1, inplace=False)
472
+ )
473
+ )
474
+ (20): RobertaLayer(
475
+ (attention): RobertaAttention(
476
+ (self): RobertaSelfAttention(
477
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
478
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
479
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
480
+ (dropout): Dropout(p=0.1, inplace=False)
481
+ )
482
+ (output): RobertaSelfOutput(
483
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
484
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
485
+ (dropout): Dropout(p=0.1, inplace=False)
486
+ )
487
+ )
488
+ (intermediate): RobertaIntermediate(
489
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
490
+ )
491
+ (output): RobertaOutput(
492
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
493
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
494
+ (dropout): Dropout(p=0.1, inplace=False)
495
+ )
496
+ )
497
+ (21): RobertaLayer(
498
+ (attention): RobertaAttention(
499
+ (self): RobertaSelfAttention(
500
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
501
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
502
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
503
+ (dropout): Dropout(p=0.1, inplace=False)
504
+ )
505
+ (output): RobertaSelfOutput(
506
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
507
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
508
+ (dropout): Dropout(p=0.1, inplace=False)
509
+ )
510
+ )
511
+ (intermediate): RobertaIntermediate(
512
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
513
+ )
514
+ (output): RobertaOutput(
515
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
516
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
517
+ (dropout): Dropout(p=0.1, inplace=False)
518
+ )
519
+ )
520
+ (22): RobertaLayer(
521
+ (attention): RobertaAttention(
522
+ (self): RobertaSelfAttention(
523
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
524
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
525
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
526
+ (dropout): Dropout(p=0.1, inplace=False)
527
+ )
528
+ (output): RobertaSelfOutput(
529
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
530
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
531
+ (dropout): Dropout(p=0.1, inplace=False)
532
+ )
533
+ )
534
+ (intermediate): RobertaIntermediate(
535
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
536
+ )
537
+ (output): RobertaOutput(
538
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
539
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
540
+ (dropout): Dropout(p=0.1, inplace=False)
541
+ )
542
+ )
543
+ (23): RobertaLayer(
544
+ (attention): RobertaAttention(
545
+ (self): RobertaSelfAttention(
546
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
547
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
548
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
549
+ (dropout): Dropout(p=0.1, inplace=False)
550
+ )
551
+ (output): RobertaSelfOutput(
552
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
553
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
554
+ (dropout): Dropout(p=0.1, inplace=False)
555
+ )
556
+ )
557
+ (intermediate): RobertaIntermediate(
558
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
559
+ )
560
+ (output): RobertaOutput(
561
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
562
+ (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
563
+ (dropout): Dropout(p=0.1, inplace=False)
564
+ )
565
+ )
566
+ )
567
+ )
568
+ (pooler): RobertaPooler(
569
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
570
+ (activation): Tanh()
571
+ )
572
+ )
573
+ )
574
+ (word_dropout): WordDropout(p=0.05)
575
+ (locked_dropout): LockedDropout(p=0.5)
576
+ (linear): Linear(in_features=1024, out_features=20, bias=True)
577
+ (beta): 1.0
578
+ (weights): None
579
+ (weight_tensor) None
580
+ )"
581
+ 2021-01-20 22:30:34,821 ----------------------------------------------------------------------------------------------------
582
+ 2021-01-20 22:30:34,821 Corpus: "Corpus: 16093 train + 2969 dev + 5314 test sentences"
583
+ 2021-01-20 22:30:34,821 ----------------------------------------------------------------------------------------------------
584
+ 2021-01-20 22:30:34,821 Parameters:
585
+ 2021-01-20 22:30:34,821 - learning_rate: "5e-06"
586
+ 2021-01-20 22:30:34,821 - mini_batch_size: "4"
587
+ 2021-01-20 22:30:34,821 - patience: "3"
588
+ 2021-01-20 22:30:34,821 - anneal_factor: "0.5"
589
+ 2021-01-20 22:30:34,822 - max_epochs: "20"
590
+ 2021-01-20 22:30:34,822 - shuffle: "True"
591
+ 2021-01-20 22:30:34,822 - train_with_dev: "True"
592
+ 2021-01-20 22:30:34,822 - batch_growth_annealing: "False"
593
+ 2021-01-20 22:30:34,822 ----------------------------------------------------------------------------------------------------
594
+ 2021-01-20 22:30:34,822 Model training base path: "resources/contextdrop/flert-nl-ft+dev-xlm-roberta-large-context+drop-64-True-127"
595
+ 2021-01-20 22:30:34,822 ----------------------------------------------------------------------------------------------------
596
+ 2021-01-20 22:30:34,822 Device: cuda:0
597
+ 2021-01-20 22:30:34,822 ----------------------------------------------------------------------------------------------------
598
+ 2021-01-20 22:30:34,822 Embeddings storage mode: none
599
+ 2021-01-20 22:30:34,833 ----------------------------------------------------------------------------------------------------
600
+ 2021-01-20 22:34:24,138 epoch 1 - iter 476/4766 - loss 0.75007446 - samples/sec: 8.30 - lr: 0.000005
601
+ 2021-01-20 22:38:11,813 epoch 1 - iter 952/4766 - loss 0.55138470 - samples/sec: 8.36 - lr: 0.000005
602
+ 2021-01-20 22:42:03,548 epoch 1 - iter 1428/4766 - loss 0.46882800 - samples/sec: 8.22 - lr: 0.000005
603
+ 2021-01-20 22:45:56,496 epoch 1 - iter 1904/4766 - loss 0.42568348 - samples/sec: 8.17 - lr: 0.000005
604
+ 2021-01-20 22:49:48,705 epoch 1 - iter 2380/4766 - loss 0.40460601 - samples/sec: 8.20 - lr: 0.000005
605
+ 2021-01-20 22:53:40,511 epoch 1 - iter 2856/4766 - loss 0.38479376 - samples/sec: 8.21 - lr: 0.000005
606
+ 2021-01-20 22:57:31,693 epoch 1 - iter 3332/4766 - loss 0.36783532 - samples/sec: 8.24 - lr: 0.000005
607
+ 2021-01-20 23:01:24,894 epoch 1 - iter 3808/4766 - loss 0.35297261 - samples/sec: 8.17 - lr: 0.000005
608
+ 2021-01-20 23:05:16,842 epoch 1 - iter 4284/4766 - loss 0.33562353 - samples/sec: 8.21 - lr: 0.000005
609
+ 2021-01-20 23:09:08,356 epoch 1 - iter 4760/4766 - loss 0.32624764 - samples/sec: 8.22 - lr: 0.000005
610
+ 2021-01-20 23:09:11,043 ----------------------------------------------------------------------------------------------------
611
+ 2021-01-20 23:09:11,044 EPOCH 1 done: loss 0.3260 - lr 0.0000050
612
+ 2021-01-20 23:09:11,044 BAD EPOCHS (no improvement): 4
613
+ 2021-01-20 23:09:11,056 ----------------------------------------------------------------------------------------------------
614
+ 2021-01-20 23:13:02,174 epoch 2 - iter 476/4766 - loss 0.19592687 - samples/sec: 8.24 - lr: 0.000005
615
+ 2021-01-20 23:16:52,896 epoch 2 - iter 952/4766 - loss 0.19343522 - samples/sec: 8.25 - lr: 0.000005
616
+ 2021-01-20 23:20:44,314 epoch 2 - iter 1428/4766 - loss 0.19096819 - samples/sec: 8.23 - lr: 0.000005
617
+ 2021-01-20 23:24:34,798 epoch 2 - iter 1904/4766 - loss 0.20419720 - samples/sec: 8.26 - lr: 0.000005
618
+ 2021-01-20 23:28:25,592 epoch 2 - iter 2380/4766 - loss 0.20562715 - samples/sec: 8.25 - lr: 0.000005
619
+ 2021-01-20 23:32:18,034 epoch 2 - iter 2856/4766 - loss 0.21479885 - samples/sec: 8.19 - lr: 0.000005
620
+ 2021-01-20 23:36:11,088 epoch 2 - iter 3332/4766 - loss 0.22119955 - samples/sec: 8.17 - lr: 0.000005
621
+ 2021-01-20 23:39:57,520 epoch 2 - iter 3808/4766 - loss 0.22084426 - samples/sec: 8.41 - lr: 0.000005
622
+ 2021-01-20 23:43:40,262 epoch 2 - iter 4284/4766 - loss 0.22666022 - samples/sec: 8.55 - lr: 0.000005
623
+ 2021-01-20 23:47:22,340 epoch 2 - iter 4760/4766 - loss 0.22898245 - samples/sec: 8.57 - lr: 0.000005
624
+ 2021-01-20 23:47:24,928 ----------------------------------------------------------------------------------------------------
625
+ 2021-01-20 23:47:24,928 EPOCH 2 done: loss 0.2291 - lr 0.0000049
626
+ 2021-01-20 23:47:24,928 BAD EPOCHS (no improvement): 4
627
+ 2021-01-20 23:47:24,932 ----------------------------------------------------------------------------------------------------
628
+ 2021-01-20 23:51:06,331 epoch 3 - iter 476/4766 - loss 0.17300695 - samples/sec: 8.60 - lr: 0.000005
629
+ 2021-01-20 23:54:48,800 epoch 3 - iter 952/4766 - loss 0.18720678 - samples/sec: 8.56 - lr: 0.000005
630
+ 2021-01-20 23:58:33,629 epoch 3 - iter 1428/4766 - loss 0.18315013 - samples/sec: 8.47 - lr: 0.000005
631
+ 2021-01-21 00:02:15,888 epoch 3 - iter 1904/4766 - loss 0.18674032 - samples/sec: 8.57 - lr: 0.000005
632
+ 2021-01-21 00:05:57,520 epoch 3 - iter 2380/4766 - loss 0.19216686 - samples/sec: 8.59 - lr: 0.000005
633
+ 2021-01-21 00:09:39,305 epoch 3 - iter 2856/4766 - loss 0.19094677 - samples/sec: 8.59 - lr: 0.000005
634
+ 2021-01-21 00:13:20,604 epoch 3 - iter 3332/4766 - loss 0.18956430 - samples/sec: 8.60 - lr: 0.000005
635
+ 2021-01-21 00:17:01,961 epoch 3 - iter 3808/4766 - loss 0.18552889 - samples/sec: 8.60 - lr: 0.000005
636
+ 2021-01-21 00:20:43,755 epoch 3 - iter 4284/4766 - loss 0.18237621 - samples/sec: 8.59 - lr: 0.000005
637
+ 2021-01-21 00:24:26,424 epoch 3 - iter 4760/4766 - loss 0.18548491 - samples/sec: 8.55 - lr: 0.000005
638
+ 2021-01-21 00:24:29,094 ----------------------------------------------------------------------------------------------------
639
+ 2021-01-21 00:24:29,094 EPOCH 3 done: loss 0.1856 - lr 0.0000047
640
+ 2021-01-21 00:24:29,094 BAD EPOCHS (no improvement): 4
641
+ 2021-01-21 00:24:29,113 ----------------------------------------------------------------------------------------------------
642
+ 2021-01-21 00:28:10,733 epoch 4 - iter 476/4766 - loss 0.16395309 - samples/sec: 8.59 - lr: 0.000005
643
+ 2021-01-21 00:31:51,536 epoch 4 - iter 952/4766 - loss 0.15725064 - samples/sec: 8.62 - lr: 0.000005
644
+ 2021-01-21 00:35:32,411 epoch 4 - iter 1428/4766 - loss 0.15046027 - samples/sec: 8.62 - lr: 0.000005
645
+ 2021-01-21 00:39:11,999 epoch 4 - iter 1904/4766 - loss 0.15211000 - samples/sec: 8.67 - lr: 0.000005
646
+ 2021-01-21 00:42:52,983 epoch 4 - iter 2380/4766 - loss 0.15810432 - samples/sec: 8.62 - lr: 0.000005
647
+ 2021-01-21 00:46:35,874 epoch 4 - iter 2856/4766 - loss 0.15986602 - samples/sec: 8.54 - lr: 0.000005
648
+ 2021-01-21 00:50:17,362 epoch 4 - iter 3332/4766 - loss 0.15994249 - samples/sec: 8.60 - lr: 0.000005
649
+ 2021-01-21 00:53:58,810 epoch 4 - iter 3808/4766 - loss 0.15891707 - samples/sec: 8.60 - lr: 0.000005
650
+ 2021-01-21 00:57:39,682 epoch 4 - iter 4284/4766 - loss 0.16493451 - samples/sec: 8.62 - lr: 0.000005
651
+ 2021-01-21 01:01:20,887 epoch 4 - iter 4760/4766 - loss 0.16578159 - samples/sec: 8.61 - lr: 0.000005
652
+ 2021-01-21 01:01:23,546 ----------------------------------------------------------------------------------------------------
653
+ 2021-01-21 01:01:23,546 EPOCH 4 done: loss 0.1656 - lr 0.0000045
654
+ 2021-01-21 01:01:23,546 BAD EPOCHS (no improvement): 4
655
+ 2021-01-21 01:01:23,549 ----------------------------------------------------------------------------------------------------
656
+ 2021-01-21 01:05:05,137 epoch 5 - iter 476/4766 - loss 0.16713775 - samples/sec: 8.59 - lr: 0.000004
657
+ 2021-01-21 01:08:46,452 epoch 5 - iter 952/4766 - loss 0.15990526 - samples/sec: 8.60 - lr: 0.000004
658
+ 2021-01-21 01:12:28,191 epoch 5 - iter 1428/4766 - loss 0.16156578 - samples/sec: 8.59 - lr: 0.000004
659
+ 2021-01-21 01:16:08,457 epoch 5 - iter 1904/4766 - loss 0.16763724 - samples/sec: 8.64 - lr: 0.000004
660
+ 2021-01-21 01:19:50,350 epoch 5 - iter 2380/4766 - loss 0.16378794 - samples/sec: 8.58 - lr: 0.000004
661
+ 2021-01-21 01:23:30,578 epoch 5 - iter 2856/4766 - loss 0.16849384 - samples/sec: 8.65 - lr: 0.000004
662
+ 2021-01-21 01:27:10,395 epoch 5 - iter 3332/4766 - loss 0.16382910 - samples/sec: 8.66 - lr: 0.000004
663
+ 2021-01-21 01:30:51,552 epoch 5 - iter 3808/4766 - loss 0.16654785 - samples/sec: 8.61 - lr: 0.000004
664
+ 2021-01-21 01:34:33,151 epoch 5 - iter 4284/4766 - loss 0.16617839 - samples/sec: 8.59 - lr: 0.000004
665
+ 2021-01-21 01:38:13,465 epoch 5 - iter 4760/4766 - loss 0.16489933 - samples/sec: 8.64 - lr: 0.000004
666
+ 2021-01-21 01:38:16,065 ----------------------------------------------------------------------------------------------------
667
+ 2021-01-21 01:38:16,065 EPOCH 5 done: loss 0.1648 - lr 0.0000043
668
+ 2021-01-21 01:38:16,066 BAD EPOCHS (no improvement): 4
669
+ 2021-01-21 01:38:16,069 ----------------------------------------------------------------------------------------------------
670
+ 2021-01-21 01:41:56,751 epoch 6 - iter 476/4766 - loss 0.15331536 - samples/sec: 8.63 - lr: 0.000004
671
+ 2021-01-21 01:45:37,683 epoch 6 - iter 952/4766 - loss 0.16628115 - samples/sec: 8.62 - lr: 0.000004
672
+ 2021-01-21 01:49:18,657 epoch 6 - iter 1428/4766 - loss 0.16559479 - samples/sec: 8.62 - lr: 0.000004
673
+ 2021-01-21 01:52:59,337 epoch 6 - iter 1904/4766 - loss 0.16505749 - samples/sec: 8.63 - lr: 0.000004
674
+ 2021-01-21 01:56:41,398 epoch 6 - iter 2380/4766 - loss 0.16408360 - samples/sec: 8.57 - lr: 0.000004
675
+ 2021-01-21 02:00:22,782 epoch 6 - iter 2856/4766 - loss 0.16367926 - samples/sec: 8.60 - lr: 0.000004
676
+ 2021-01-21 02:04:04,491 epoch 6 - iter 3332/4766 - loss 0.16323212 - samples/sec: 8.59 - lr: 0.000004
677
+ 2021-01-21 02:07:46,417 epoch 6 - iter 3808/4766 - loss 0.16476110 - samples/sec: 8.58 - lr: 0.000004
678
+ 2021-01-21 02:11:27,402 epoch 6 - iter 4284/4766 - loss 0.16556307 - samples/sec: 8.62 - lr: 0.000004
679
+ 2021-01-21 02:15:08,877 epoch 6 - iter 4760/4766 - loss 0.16431570 - samples/sec: 8.60 - lr: 0.000004
680
+ 2021-01-21 02:15:11,479 ----------------------------------------------------------------------------------------------------
681
+ 2021-01-21 02:15:11,480 EPOCH 6 done: loss 0.1648 - lr 0.0000040
682
+ 2021-01-21 02:15:11,480 BAD EPOCHS (no improvement): 4
683
+ 2021-01-21 02:15:11,483 ----------------------------------------------------------------------------------------------------
684
+ 2021-01-21 02:18:51,563 epoch 7 - iter 476/4766 - loss 0.16677021 - samples/sec: 8.65 - lr: 0.000004
685
+ 2021-01-21 02:22:33,148 epoch 7 - iter 952/4766 - loss 0.15199812 - samples/sec: 8.59 - lr: 0.000004
686
+ 2021-01-21 02:26:14,043 epoch 7 - iter 1428/4766 - loss 0.15998079 - samples/sec: 8.62 - lr: 0.000004
687
+ 2021-01-21 02:29:54,619 epoch 7 - iter 1904/4766 - loss 0.16023978 - samples/sec: 8.63 - lr: 0.000004
688
+ 2021-01-21 02:33:35,634 epoch 7 - iter 2380/4766 - loss 0.15702676 - samples/sec: 8.62 - lr: 0.000004
689
+ 2021-01-21 02:37:16,548 epoch 7 - iter 2856/4766 - loss 0.15350997 - samples/sec: 8.62 - lr: 0.000004
690
+ 2021-01-21 02:40:57,346 epoch 7 - iter 3332/4766 - loss 0.15488921 - samples/sec: 8.62 - lr: 0.000004
691
+ 2021-01-21 02:44:38,614 epoch 7 - iter 3808/4766 - loss 0.15987947 - samples/sec: 8.61 - lr: 0.000004
692
+ 2021-01-21 02:48:20,175 epoch 7 - iter 4284/4766 - loss 0.16276295 - samples/sec: 8.59 - lr: 0.000004
693
+ 2021-01-21 02:52:01,908 epoch 7 - iter 4760/4766 - loss 0.16197284 - samples/sec: 8.59 - lr: 0.000004
694
+ 2021-01-21 02:52:04,547 ----------------------------------------------------------------------------------------------------
695
+ 2021-01-21 02:52:04,547 EPOCH 7 done: loss 0.1620 - lr 0.0000036
696
+ 2021-01-21 02:52:04,547 BAD EPOCHS (no improvement): 4
697
+ 2021-01-21 02:52:04,550 ----------------------------------------------------------------------------------------------------
698
+ 2021-01-21 02:55:44,290 epoch 8 - iter 476/4766 - loss 0.12739570 - samples/sec: 8.67 - lr: 0.000004
699
+ 2021-01-21 02:59:24,874 epoch 8 - iter 952/4766 - loss 0.13459088 - samples/sec: 8.63 - lr: 0.000004
700
+ 2021-01-21 03:03:05,915 epoch 8 - iter 1428/4766 - loss 0.13249889 - samples/sec: 8.61 - lr: 0.000004
701
+ 2021-01-21 03:07:51,438 epoch 8 - iter 1904/4766 - loss 0.13557002 - samples/sec: 6.67 - lr: 0.000003
702
+ 2021-01-21 03:11:32,960 epoch 8 - iter 2380/4766 - loss 0.13750847 - samples/sec: 8.60 - lr: 0.000003
703
+ 2021-01-21 03:15:15,240 epoch 8 - iter 2856/4766 - loss 0.13920395 - samples/sec: 8.57 - lr: 0.000003
704
+ 2021-01-21 03:18:56,540 epoch 8 - iter 3332/4766 - loss 0.14196834 - samples/sec: 8.60 - lr: 0.000003
705
+ 2021-01-21 03:22:38,133 epoch 8 - iter 3808/4766 - loss 0.14013979 - samples/sec: 8.59 - lr: 0.000003
706
+ 2021-01-21 03:26:20,491 epoch 8 - iter 4284/4766 - loss 0.14057112 - samples/sec: 8.56 - lr: 0.000003
707
+ 2021-01-21 03:30:01,506 epoch 8 - iter 4760/4766 - loss 0.13849626 - samples/sec: 8.62 - lr: 0.000003
708
+ 2021-01-21 03:30:04,136 ----------------------------------------------------------------------------------------------------
709
+ 2021-01-21 03:30:04,136 EPOCH 8 done: loss 0.1390 - lr 0.0000033
710
+ 2021-01-21 03:30:04,136 BAD EPOCHS (no improvement): 4
711
+ 2021-01-21 03:30:04,139 ----------------------------------------------------------------------------------------------------
712
+ 2021-01-21 03:33:43,789 epoch 9 - iter 476/4766 - loss 0.10898947 - samples/sec: 8.67 - lr: 0.000003
713
+ 2021-01-21 03:37:24,937 epoch 9 - iter 952/4766 - loss 0.13779523 - samples/sec: 8.61 - lr: 0.000003
714
+ 2021-01-21 03:41:06,312 epoch 9 - iter 1428/4766 - loss 0.13999643 - samples/sec: 8.60 - lr: 0.000003
715
+ 2021-01-21 03:44:48,413 epoch 9 - iter 1904/4766 - loss 0.14934964 - samples/sec: 8.57 - lr: 0.000003
716
+ 2021-01-21 03:48:28,888 epoch 9 - iter 2380/4766 - loss 0.14817911 - samples/sec: 8.64 - lr: 0.000003
717
+ 2021-01-21 03:52:09,651 epoch 9 - iter 2856/4766 - loss 0.14990197 - samples/sec: 8.63 - lr: 0.000003
718
+ 2021-01-21 03:55:50,402 epoch 9 - iter 3332/4766 - loss 0.15379190 - samples/sec: 8.63 - lr: 0.000003
719
+ 2021-01-21 03:59:32,243 epoch 9 - iter 3808/4766 - loss 0.15360767 - samples/sec: 8.58 - lr: 0.000003
720
+ 2021-01-21 04:03:12,525 epoch 9 - iter 4284/4766 - loss 0.15584102 - samples/sec: 8.64 - lr: 0.000003
721
+ 2021-01-21 04:06:52,524 epoch 9 - iter 4760/4766 - loss 0.15575696 - samples/sec: 8.66 - lr: 0.000003
722
+ 2021-01-21 04:06:55,162 ----------------------------------------------------------------------------------------------------
723
+ 2021-01-21 04:06:55,162 EPOCH 9 done: loss 0.1559 - lr 0.0000029
724
+ 2021-01-21 04:06:55,162 BAD EPOCHS (no improvement): 4
725
+ 2021-01-21 04:06:55,174 ----------------------------------------------------------------------------------------------------
726
+ 2021-01-21 04:10:34,900 epoch 10 - iter 476/4766 - loss 0.16271080 - samples/sec: 8.67 - lr: 0.000003
727
+ 2021-01-21 04:14:20,175 epoch 10 - iter 952/4766 - loss 0.16397437 - samples/sec: 8.45 - lr: 0.000003
728
+ 2021-01-21 04:18:06,987 epoch 10 - iter 1428/4766 - loss 0.15725672 - samples/sec: 8.40 - lr: 0.000003
729
+ 2021-01-21 04:21:49,215 epoch 10 - iter 1904/4766 - loss 0.15423771 - samples/sec: 8.57 - lr: 0.000003
730
+ 2021-01-21 04:25:28,895 epoch 10 - iter 2380/4766 - loss 0.15973856 - samples/sec: 8.67 - lr: 0.000003
731
+ 2021-01-21 04:29:23,464 epoch 10 - iter 2856/4766 - loss 0.16022188 - samples/sec: 8.12 - lr: 0.000003
732
+ 2021-01-21 04:33:45,631 epoch 10 - iter 3332/4766 - loss 0.16116028 - samples/sec: 7.26 - lr: 0.000003
733
+ 2021-01-21 04:37:33,764 epoch 10 - iter 3808/4766 - loss 0.16539610 - samples/sec: 8.35 - lr: 0.000003
734
+ 2021-01-21 04:42:13,315 epoch 10 - iter 4284/4766 - loss 0.16546677 - samples/sec: 6.81 - lr: 0.000003
735
+ 2021-01-21 04:45:59,709 epoch 10 - iter 4760/4766 - loss 0.16271866 - samples/sec: 8.41 - lr: 0.000003
736
+ 2021-01-21 04:46:02,392 ----------------------------------------------------------------------------------------------------
737
+ 2021-01-21 04:46:02,392 EPOCH 10 done: loss 0.1625 - lr 0.0000025
738
+ 2021-01-21 04:46:02,392 BAD EPOCHS (no improvement): 4
739
+ 2021-01-21 04:46:02,396 ----------------------------------------------------------------------------------------------------
740
+ 2021-01-21 04:49:48,063 epoch 11 - iter 476/4766 - loss 0.12302402 - samples/sec: 8.44 - lr: 0.000002
741
+ 2021-01-21 04:53:27,641 epoch 11 - iter 952/4766 - loss 0.14938588 - samples/sec: 8.67 - lr: 0.000002
742
+ 2021-01-21 04:57:17,073 epoch 11 - iter 1428/4766 - loss 0.15249822 - samples/sec: 8.30 - lr: 0.000002
743
+ 2021-01-21 05:01:04,811 epoch 11 - iter 1904/4766 - loss 0.15278022 - samples/sec: 8.36 - lr: 0.000002
744
+ 2021-01-21 05:04:54,048 epoch 11 - iter 2380/4766 - loss 0.14726127 - samples/sec: 8.31 - lr: 0.000002
745
+ 2021-01-21 05:08:43,193 epoch 11 - iter 2856/4766 - loss 0.14789523 - samples/sec: 8.31 - lr: 0.000002
746
+ 2021-01-21 05:13:06,493 epoch 11 - iter 3332/4766 - loss 0.14714088 - samples/sec: 7.23 - lr: 0.000002
747
+ 2021-01-21 05:16:50,965 epoch 11 - iter 3808/4766 - loss 0.14520739 - samples/sec: 8.48 - lr: 0.000002
748
+ 2021-01-21 05:20:39,478 epoch 11 - iter 4284/4766 - loss 0.14887415 - samples/sec: 8.33 - lr: 0.000002
749
+ 2021-01-21 05:24:29,111 epoch 11 - iter 4760/4766 - loss 0.14659288 - samples/sec: 8.29 - lr: 0.000002
750
+ 2021-01-21 05:24:31,802 ----------------------------------------------------------------------------------------------------
751
+ 2021-01-21 05:24:31,802 EPOCH 11 done: loss 0.1467 - lr 0.0000021
752
+ 2021-01-21 05:24:31,802 BAD EPOCHS (no improvement): 4
753
+ 2021-01-21 05:24:31,805 ----------------------------------------------------------------------------------------------------
754
+ 2021-01-21 05:28:14,475 epoch 12 - iter 476/4766 - loss 0.15315567 - samples/sec: 8.55 - lr: 0.000002
755
+ 2021-01-21 05:31:59,651 epoch 12 - iter 952/4766 - loss 0.16653427 - samples/sec: 8.46 - lr: 0.000002
756
+ 2021-01-21 05:35:41,742 epoch 12 - iter 1428/4766 - loss 0.15943798 - samples/sec: 8.57 - lr: 0.000002
757
+ 2021-01-21 05:39:23,773 epoch 12 - iter 1904/4766 - loss 0.14738183 - samples/sec: 8.58 - lr: 0.000002
758
+ 2021-01-21 05:43:07,737 epoch 12 - iter 2380/4766 - loss 0.14768732 - samples/sec: 8.50 - lr: 0.000002
759
+ 2021-01-21 05:46:50,097 epoch 12 - iter 2856/4766 - loss 0.14579714 - samples/sec: 8.56 - lr: 0.000002
760
+ 2021-01-21 05:50:30,750 epoch 12 - iter 3332/4766 - loss 0.14426661 - samples/sec: 8.63 - lr: 0.000002
761
+ 2021-01-21 05:54:10,533 epoch 12 - iter 3808/4766 - loss 0.14331669 - samples/sec: 8.66 - lr: 0.000002
762
+ 2021-01-21 05:57:51,040 epoch 12 - iter 4284/4766 - loss 0.14558392 - samples/sec: 8.64 - lr: 0.000002
763
+ 2021-01-21 06:01:31,114 epoch 12 - iter 4760/4766 - loss 0.14487869 - samples/sec: 8.65 - lr: 0.000002
764
+ 2021-01-21 06:01:33,698 ----------------------------------------------------------------------------------------------------
765
+ 2021-01-21 06:01:33,699 EPOCH 12 done: loss 0.1448 - lr 0.0000017
766
+ 2021-01-21 06:01:33,699 BAD EPOCHS (no improvement): 4
767
+ 2021-01-21 06:01:33,728 ----------------------------------------------------------------------------------------------------
768
+ 2021-01-21 06:05:13,916 epoch 13 - iter 476/4766 - loss 0.14655107 - samples/sec: 8.65 - lr: 0.000002
769
+ 2021-01-21 06:09:00,692 epoch 13 - iter 952/4766 - loss 0.15434704 - samples/sec: 8.40 - lr: 0.000002
770
+ 2021-01-21 06:13:01,021 epoch 13 - iter 1428/4766 - loss 0.14097797 - samples/sec: 7.92 - lr: 0.000002
771
+ 2021-01-21 06:16:53,666 epoch 13 - iter 1904/4766 - loss 0.14277714 - samples/sec: 8.18 - lr: 0.000002
772
+ 2021-01-21 06:20:42,859 epoch 13 - iter 2380/4766 - loss 0.14354307 - samples/sec: 8.31 - lr: 0.000002
773
+ 2021-01-21 06:24:31,146 epoch 13 - iter 2856/4766 - loss 0.14679997 - samples/sec: 8.34 - lr: 0.000002
774
+ 2021-01-21 06:28:19,832 epoch 13 - iter 3332/4766 - loss 0.14780579 - samples/sec: 8.33 - lr: 0.000001
775
+ 2021-01-21 06:32:08,563 epoch 13 - iter 3808/4766 - loss 0.14877294 - samples/sec: 8.32 - lr: 0.000001
776
+ 2021-01-21 06:35:55,834 epoch 13 - iter 4284/4766 - loss 0.14803883 - samples/sec: 8.38 - lr: 0.000001
777
+ 2021-01-21 06:39:44,884 epoch 13 - iter 4760/4766 - loss 0.15072743 - samples/sec: 8.31 - lr: 0.000001
778
+ 2021-01-21 06:39:47,605 ----------------------------------------------------------------------------------------------------
779
+ 2021-01-21 06:39:47,605 EPOCH 13 done: loss 0.1512 - lr 0.0000014
780
+ 2021-01-21 06:39:47,605 BAD EPOCHS (no improvement): 4
781
+ 2021-01-21 06:39:47,610 ----------------------------------------------------------------------------------------------------
782
+ 2021-01-21 06:43:34,894 epoch 14 - iter 476/4766 - loss 0.11684375 - samples/sec: 8.38 - lr: 0.000001
783
+ 2021-01-21 06:47:22,075 epoch 14 - iter 952/4766 - loss 0.13685666 - samples/sec: 8.38 - lr: 0.000001
784
+ 2021-01-21 06:51:09,835 epoch 14 - iter 1428/4766 - loss 0.15137543 - samples/sec: 8.36 - lr: 0.000001
785
+ 2021-01-21 06:54:56,328 epoch 14 - iter 1904/4766 - loss 0.15223388 - samples/sec: 8.41 - lr: 0.000001
786
+ 2021-01-21 06:58:43,179 epoch 14 - iter 2380/4766 - loss 0.15232770 - samples/sec: 8.39 - lr: 0.000001
787
+ 2021-01-21 07:02:29,960 epoch 14 - iter 2856/4766 - loss 0.15376646 - samples/sec: 8.40 - lr: 0.000001
788
+ 2021-01-21 07:06:16,979 epoch 14 - iter 3332/4766 - loss 0.14910628 - samples/sec: 8.39 - lr: 0.000001
789
+ 2021-01-21 07:10:05,313 epoch 14 - iter 3808/4766 - loss 0.15073272 - samples/sec: 8.34 - lr: 0.000001
790
+ 2021-01-21 07:13:52,950 epoch 14 - iter 4284/4766 - loss 0.14982179 - samples/sec: 8.36 - lr: 0.000001
791
+ 2021-01-21 07:17:41,726 epoch 14 - iter 4760/4766 - loss 0.14669553 - samples/sec: 8.32 - lr: 0.000001
792
+ 2021-01-21 07:17:44,436 ----------------------------------------------------------------------------------------------------
793
+ 2021-01-21 07:17:44,436 EPOCH 14 done: loss 0.1467 - lr 0.0000010
794
+ 2021-01-21 07:17:44,436 BAD EPOCHS (no improvement): 4
795
+ 2021-01-21 07:17:44,439 ----------------------------------------------------------------------------------------------------
796
+ 2021-01-21 07:21:32,208 epoch 15 - iter 476/4766 - loss 0.15710687 - samples/sec: 8.36 - lr: 0.000001
797
+ 2021-01-21 07:25:20,097 epoch 15 - iter 952/4766 - loss 0.15127131 - samples/sec: 8.36 - lr: 0.000001
798
+ 2021-01-21 07:29:09,242 epoch 15 - iter 1428/4766 - loss 0.15385280 - samples/sec: 8.31 - lr: 0.000001
799
+ 2021-01-21 07:32:56,645 epoch 15 - iter 1904/4766 - loss 0.15263483 - samples/sec: 8.37 - lr: 0.000001
800
+ 2021-01-21 07:36:44,549 epoch 15 - iter 2380/4766 - loss 0.15494254 - samples/sec: 8.35 - lr: 0.000001
801
+ 2021-01-21 07:40:31,861 epoch 15 - iter 2856/4766 - loss 0.14994557 - samples/sec: 8.38 - lr: 0.000001
802
+ 2021-01-21 07:44:20,745 epoch 15 - iter 3332/4766 - loss 0.15018726 - samples/sec: 8.32 - lr: 0.000001
803
+ 2021-01-21 07:48:07,710 epoch 15 - iter 3808/4766 - loss 0.14815315 - samples/sec: 8.39 - lr: 0.000001
804
+ 2021-01-21 07:51:58,674 epoch 15 - iter 4284/4766 - loss 0.14728940 - samples/sec: 8.24 - lr: 0.000001
805
+ 2021-01-21 07:55:50,263 epoch 15 - iter 4760/4766 - loss 0.14723711 - samples/sec: 8.22 - lr: 0.000001
806
+ 2021-01-21 07:55:53,003 ----------------------------------------------------------------------------------------------------
807
+ 2021-01-21 07:55:53,003 EPOCH 15 done: loss 0.1473 - lr 0.0000007
808
+ 2021-01-21 07:55:53,003 BAD EPOCHS (no improvement): 4
809
+ 2021-01-21 07:55:53,008 ----------------------------------------------------------------------------------------------------
810
+ 2021-01-21 07:59:44,568 epoch 16 - iter 476/4766 - loss 0.13166130 - samples/sec: 8.22 - lr: 0.000001
811
+ 2021-01-21 08:03:36,181 epoch 16 - iter 952/4766 - loss 0.14175737 - samples/sec: 8.22 - lr: 0.000001
812
+ 2021-01-21 08:07:28,882 epoch 16 - iter 1428/4766 - loss 0.14304356 - samples/sec: 8.18 - lr: 0.000001
813
+ 2021-01-21 08:11:20,434 epoch 16 - iter 1904/4766 - loss 0.14622200 - samples/sec: 8.22 - lr: 0.000001
814
+ 2021-01-21 08:15:12,406 epoch 16 - iter 2380/4766 - loss 0.14768067 - samples/sec: 8.21 - lr: 0.000001
815
+ 2021-01-21 08:19:04,996 epoch 16 - iter 2856/4766 - loss 0.14707410 - samples/sec: 8.19 - lr: 0.000001
816
+ 2021-01-21 08:22:56,583 epoch 16 - iter 3332/4766 - loss 0.14688055 - samples/sec: 8.22 - lr: 0.000001
817
+ 2021-01-21 08:27:15,003 epoch 16 - iter 3808/4766 - loss 0.14730450 - samples/sec: 7.37 - lr: 0.000001
818
+ 2021-01-21 08:31:07,174 epoch 16 - iter 4284/4766 - loss 0.14827136 - samples/sec: 8.20 - lr: 0.000001
819
+ 2021-01-21 08:34:59,482 epoch 16 - iter 4760/4766 - loss 0.14568427 - samples/sec: 8.20 - lr: 0.000000
820
+ 2021-01-21 08:35:02,197 ----------------------------------------------------------------------------------------------------
821
+ 2021-01-21 08:35:02,198 EPOCH 16 done: loss 0.1456 - lr 0.0000005
822
+ 2021-01-21 08:35:02,198 BAD EPOCHS (no improvement): 4
823
+ 2021-01-21 08:35:02,216 ----------------------------------------------------------------------------------------------------
824
+ 2021-01-21 08:38:52,372 epoch 17 - iter 476/4766 - loss 0.12585091 - samples/sec: 8.27 - lr: 0.000000
825
+ 2021-01-21 08:42:26,708 epoch 17 - iter 952/4766 - loss 0.13980769 - samples/sec: 8.88 - lr: 0.000000
826
+ 2021-01-21 08:45:38,094 epoch 17 - iter 1428/4766 - loss 0.13790265 - samples/sec: 9.95 - lr: 0.000000
827
+ 2021-01-21 08:48:48,648 epoch 17 - iter 1904/4766 - loss 0.13518588 - samples/sec: 9.99 - lr: 0.000000
828
+ 2021-01-21 08:52:38,876 epoch 17 - iter 2380/4766 - loss 0.14102829 - samples/sec: 8.27 - lr: 0.000000
829
+ 2021-01-21 08:58:28,052 epoch 17 - iter 2856/4766 - loss 0.13996114 - samples/sec: 5.45 - lr: 0.000000
830
+ 2021-01-21 09:04:23,763 epoch 17 - iter 3332/4766 - loss 0.13826631 - samples/sec: 5.35 - lr: 0.000000
831
+ 2021-01-21 09:07:47,606 epoch 17 - iter 3808/4766 - loss 0.13959091 - samples/sec: 9.34 - lr: 0.000000
832
+ 2021-01-21 09:10:58,844 epoch 17 - iter 4284/4766 - loss 0.13834961 - samples/sec: 9.96 - lr: 0.000000
833
+ 2021-01-21 09:14:07,816 epoch 17 - iter 4760/4766 - loss 0.14037759 - samples/sec: 10.08 - lr: 0.000000
834
+ 2021-01-21 09:14:10,160 ----------------------------------------------------------------------------------------------------
835
+ 2021-01-21 09:14:10,160 EPOCH 17 done: loss 0.1403 - lr 0.0000003
836
+ 2021-01-21 09:14:10,160 BAD EPOCHS (no improvement): 4
837
+ 2021-01-21 09:14:10,181 ----------------------------------------------------------------------------------------------------
838
+ 2021-01-21 09:17:20,231 epoch 18 - iter 476/4766 - loss 0.13481177 - samples/sec: 10.02 - lr: 0.000000
839
+ 2021-01-21 09:20:31,285 epoch 18 - iter 952/4766 - loss 0.12601264 - samples/sec: 9.97 - lr: 0.000000
840
+ 2021-01-21 09:23:41,236 epoch 18 - iter 1428/4766 - loss 0.12608326 - samples/sec: 10.02 - lr: 0.000000
841
+ 2021-01-21 09:26:51,839 epoch 18 - iter 1904/4766 - loss 0.13399083 - samples/sec: 9.99 - lr: 0.000000
842
+ 2021-01-21 09:30:03,764 epoch 18 - iter 2380/4766 - loss 0.13876490 - samples/sec: 9.92 - lr: 0.000000
843
+ 2021-01-21 09:33:15,574 epoch 18 - iter 2856/4766 - loss 0.13878700 - samples/sec: 9.93 - lr: 0.000000
844
+ 2021-01-21 09:36:26,971 epoch 18 - iter 3332/4766 - loss 0.14409246 - samples/sec: 9.95 - lr: 0.000000
845
+ 2021-01-21 09:39:37,934 epoch 18 - iter 3808/4766 - loss 0.14454244 - samples/sec: 9.97 - lr: 0.000000
846
+ 2021-01-21 09:42:48,260 epoch 18 - iter 4284/4766 - loss 0.14386075 - samples/sec: 10.00 - lr: 0.000000
847
+ 2021-01-21 09:45:58,345 epoch 18 - iter 4760/4766 - loss 0.14489400 - samples/sec: 10.02 - lr: 0.000000
848
+ 2021-01-21 09:46:00,567 ----------------------------------------------------------------------------------------------------
849
+ 2021-01-21 09:46:00,567 EPOCH 18 done: loss 0.1448 - lr 0.0000001
850
+ 2021-01-21 09:46:00,567 BAD EPOCHS (no improvement): 4
851
+ 2021-01-21 09:46:00,570 ----------------------------------------------------------------------------------------------------
852
+ 2021-01-21 09:49:13,016 epoch 19 - iter 476/4766 - loss 0.16550822 - samples/sec: 9.89 - lr: 0.000000
853
+ 2021-01-21 09:52:27,091 epoch 19 - iter 952/4766 - loss 0.13214122 - samples/sec: 9.81 - lr: 0.000000
854
+ 2021-01-21 09:55:42,085 epoch 19 - iter 1428/4766 - loss 0.13831234 - samples/sec: 9.77 - lr: 0.000000
855
+ 2021-01-21 09:58:56,680 epoch 19 - iter 1904/4766 - loss 0.13832571 - samples/sec: 9.79 - lr: 0.000000
856
+ 2021-01-21 10:02:12,350 epoch 19 - iter 2380/4766 - loss 0.13808449 - samples/sec: 9.73 - lr: 0.000000
857
+ 2021-01-21 10:05:26,205 epoch 19 - iter 2856/4766 - loss 0.13753814 - samples/sec: 9.82 - lr: 0.000000
858
+ 2021-01-21 10:08:40,777 epoch 19 - iter 3332/4766 - loss 0.13826467 - samples/sec: 9.79 - lr: 0.000000
859
+ 2021-01-21 10:11:55,648 epoch 19 - iter 3808/4766 - loss 0.14029889 - samples/sec: 9.77 - lr: 0.000000
860
+ 2021-01-21 10:15:10,349 epoch 19 - iter 4284/4766 - loss 0.13696667 - samples/sec: 9.78 - lr: 0.000000
861
+ 2021-01-21 10:18:24,777 epoch 19 - iter 4760/4766 - loss 0.13874853 - samples/sec: 9.79 - lr: 0.000000
862
+ 2021-01-21 10:18:27,049 ----------------------------------------------------------------------------------------------------
863
+ 2021-01-21 10:18:27,049 EPOCH 19 done: loss 0.1386 - lr 0.0000000
864
+ 2021-01-21 10:18:27,049 BAD EPOCHS (no improvement): 4
865
+ 2021-01-21 10:18:27,582 ----------------------------------------------------------------------------------------------------
866
+ 2021-01-21 10:21:42,494 epoch 20 - iter 476/4766 - loss 0.11851291 - samples/sec: 9.77 - lr: 0.000000
867
+ 2021-01-21 10:24:56,145 epoch 20 - iter 952/4766 - loss 0.13441288 - samples/sec: 9.83 - lr: 0.000000
868
+ 2021-01-21 10:28:10,170 epoch 20 - iter 1428/4766 - loss 0.14083137 - samples/sec: 9.81 - lr: 0.000000
869
+ 2021-01-21 10:31:25,784 epoch 20 - iter 1904/4766 - loss 0.14039091 - samples/sec: 9.73 - lr: 0.000000
870
+ 2021-01-21 10:34:40,300 epoch 20 - iter 2380/4766 - loss 0.14164687 - samples/sec: 9.79 - lr: 0.000000
871
+ 2021-01-21 10:37:54,324 epoch 20 - iter 2856/4766 - loss 0.13843665 - samples/sec: 9.81 - lr: 0.000000
872
+ 2021-01-21 10:41:05,695 epoch 20 - iter 3332/4766 - loss 0.13902040 - samples/sec: 9.95 - lr: 0.000000
873
+ 2021-01-21 10:44:16,299 epoch 20 - iter 3808/4766 - loss 0.13728566 - samples/sec: 9.99 - lr: 0.000000
874
+ 2021-01-21 10:47:26,320 epoch 20 - iter 4284/4766 - loss 0.13661214 - samples/sec: 10.02 - lr: 0.000000
875
+ 2021-01-21 10:50:35,967 epoch 20 - iter 4760/4766 - loss 0.13488013 - samples/sec: 10.04 - lr: 0.000000
876
+ 2021-01-21 10:50:38,248 ----------------------------------------------------------------------------------------------------
877
+ 2021-01-21 10:50:38,248 EPOCH 20 done: loss 0.1348 - lr 0.0000000
878
+ 2021-01-21 10:50:38,248 BAD EPOCHS (no improvement): 4
879
+ 2021-01-21 10:51:28,424 ----------------------------------------------------------------------------------------------------
880
+ 2021-01-21 10:51:28,425 Testing using best model ...
881
+ 2021-01-21 10:54:06,963 0.9530 0.9520 0.9525
882
+ 2021-01-21 10:54:06,963
883
+ Results:
884
+ - F1-score (micro) 0.9525
885
+ - F1-score (macro) 0.9528
886
+
887
+ By class:
888
+ LOC tp: 751 - fp: 36 - fn: 23 - precision: 0.9543 - recall: 0.9703 - f1-score: 0.9622
889
+ MISC tp: 1095 - fp: 56 - fn: 92 - precision: 0.9513 - recall: 0.9225 - f1-score: 0.9367
890
+ ORG tp: 834 - fp: 59 - fn: 48 - precision: 0.9339 - recall: 0.9456 - f1-score: 0.9397
891
+ PER tp: 1072 - fp: 34 - fn: 26 - precision: 0.9693 - recall: 0.9763 - f1-score: 0.9728
892
+ 2021-01-21 10:54:06,963 ----------------------------------------------------------------------------------------------------