File size: 24,095 Bytes
e1b6c7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
2023-10-17 16:47:12,055 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:12,057 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): ElectraModel(
      (embeddings): ElectraEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): ElectraEncoder(
        (layer): ModuleList(
          (0-11): 12 x ElectraLayer(
            (attention): ElectraAttention(
              (self): ElectraSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): ElectraSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): ElectraIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): ElectraOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=21, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-17 16:47:12,057 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:12,057 MultiCorpus: 3575 train + 1235 dev + 1266 test sentences
 - NER_HIPE_2022 Corpus: 3575 train + 1235 dev + 1266 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/de/with_doc_seperator
2023-10-17 16:47:12,057 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:12,057 Train:  3575 sentences
2023-10-17 16:47:12,057         (train_with_dev=False, train_with_test=False)
2023-10-17 16:47:12,057 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:12,058 Training Params:
2023-10-17 16:47:12,058  - learning_rate: "3e-05" 
2023-10-17 16:47:12,058  - mini_batch_size: "8"
2023-10-17 16:47:12,058  - max_epochs: "10"
2023-10-17 16:47:12,058  - shuffle: "True"
2023-10-17 16:47:12,058 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:12,058 Plugins:
2023-10-17 16:47:12,058  - TensorboardLogger
2023-10-17 16:47:12,058  - LinearScheduler | warmup_fraction: '0.1'
2023-10-17 16:47:12,058 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:12,058 Final evaluation on model from best epoch (best-model.pt)
2023-10-17 16:47:12,058  - metric: "('micro avg', 'f1-score')"
2023-10-17 16:47:12,058 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:12,059 Computation:
2023-10-17 16:47:12,059  - compute on device: cuda:0
2023-10-17 16:47:12,059  - embedding storage: none
2023-10-17 16:47:12,059 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:12,059 Model training base path: "hmbench-hipe2020/de-hmteams/teams-base-historic-multilingual-discriminator-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-3"
2023-10-17 16:47:12,059 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:12,059 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:12,059 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-17 16:47:16,186 epoch 1 - iter 44/447 - loss 3.49661378 - time (sec): 4.13 - samples/sec: 1872.19 - lr: 0.000003 - momentum: 0.000000
2023-10-17 16:47:20,660 epoch 1 - iter 88/447 - loss 2.61631042 - time (sec): 8.60 - samples/sec: 1956.66 - lr: 0.000006 - momentum: 0.000000
2023-10-17 16:47:24,936 epoch 1 - iter 132/447 - loss 1.93352666 - time (sec): 12.88 - samples/sec: 1984.67 - lr: 0.000009 - momentum: 0.000000
2023-10-17 16:47:28,939 epoch 1 - iter 176/447 - loss 1.57530896 - time (sec): 16.88 - samples/sec: 2000.58 - lr: 0.000012 - momentum: 0.000000
2023-10-17 16:47:33,220 epoch 1 - iter 220/447 - loss 1.34725282 - time (sec): 21.16 - samples/sec: 1996.09 - lr: 0.000015 - momentum: 0.000000
2023-10-17 16:47:37,715 epoch 1 - iter 264/447 - loss 1.15393413 - time (sec): 25.65 - samples/sec: 2022.57 - lr: 0.000018 - momentum: 0.000000
2023-10-17 16:47:41,873 epoch 1 - iter 308/447 - loss 1.04226944 - time (sec): 29.81 - samples/sec: 2024.30 - lr: 0.000021 - momentum: 0.000000
2023-10-17 16:47:45,852 epoch 1 - iter 352/447 - loss 0.95292814 - time (sec): 33.79 - samples/sec: 2021.95 - lr: 0.000024 - momentum: 0.000000
2023-10-17 16:47:50,370 epoch 1 - iter 396/447 - loss 0.87278568 - time (sec): 38.31 - samples/sec: 2013.81 - lr: 0.000027 - momentum: 0.000000
2023-10-17 16:47:54,446 epoch 1 - iter 440/447 - loss 0.81583916 - time (sec): 42.39 - samples/sec: 2007.91 - lr: 0.000029 - momentum: 0.000000
2023-10-17 16:47:55,074 ----------------------------------------------------------------------------------------------------
2023-10-17 16:47:55,075 EPOCH 1 done: loss 0.8052 - lr: 0.000029
2023-10-17 16:48:01,442 DEV : loss 0.18609091639518738 - f1-score (micro avg)  0.62
2023-10-17 16:48:01,495 saving best model
2023-10-17 16:48:02,032 ----------------------------------------------------------------------------------------------------
2023-10-17 16:48:06,103 epoch 2 - iter 44/447 - loss 0.17646139 - time (sec): 4.07 - samples/sec: 2094.88 - lr: 0.000030 - momentum: 0.000000
2023-10-17 16:48:10,127 epoch 2 - iter 88/447 - loss 0.17558676 - time (sec): 8.09 - samples/sec: 2086.39 - lr: 0.000029 - momentum: 0.000000
2023-10-17 16:48:14,170 epoch 2 - iter 132/447 - loss 0.16683960 - time (sec): 12.14 - samples/sec: 2026.77 - lr: 0.000029 - momentum: 0.000000
2023-10-17 16:48:18,156 epoch 2 - iter 176/447 - loss 0.16952362 - time (sec): 16.12 - samples/sec: 1992.29 - lr: 0.000029 - momentum: 0.000000
2023-10-17 16:48:22,496 epoch 2 - iter 220/447 - loss 0.16761727 - time (sec): 20.46 - samples/sec: 2027.06 - lr: 0.000028 - momentum: 0.000000
2023-10-17 16:48:26,893 epoch 2 - iter 264/447 - loss 0.17109456 - time (sec): 24.86 - samples/sec: 2038.35 - lr: 0.000028 - momentum: 0.000000
2023-10-17 16:48:30,902 epoch 2 - iter 308/447 - loss 0.17214782 - time (sec): 28.87 - samples/sec: 2044.42 - lr: 0.000028 - momentum: 0.000000
2023-10-17 16:48:35,157 epoch 2 - iter 352/447 - loss 0.16542676 - time (sec): 33.12 - samples/sec: 2051.24 - lr: 0.000027 - momentum: 0.000000
2023-10-17 16:48:39,571 epoch 2 - iter 396/447 - loss 0.15856207 - time (sec): 37.54 - samples/sec: 2060.57 - lr: 0.000027 - momentum: 0.000000
2023-10-17 16:48:43,542 epoch 2 - iter 440/447 - loss 0.15666011 - time (sec): 41.51 - samples/sec: 2055.00 - lr: 0.000027 - momentum: 0.000000
2023-10-17 16:48:44,156 ----------------------------------------------------------------------------------------------------
2023-10-17 16:48:44,156 EPOCH 2 done: loss 0.1564 - lr: 0.000027
2023-10-17 16:48:55,101 DEV : loss 0.11850441992282867 - f1-score (micro avg)  0.7004
2023-10-17 16:48:55,153 saving best model
2023-10-17 16:48:56,532 ----------------------------------------------------------------------------------------------------
2023-10-17 16:49:00,603 epoch 3 - iter 44/447 - loss 0.08474878 - time (sec): 4.07 - samples/sec: 2113.03 - lr: 0.000026 - momentum: 0.000000
2023-10-17 16:49:04,693 epoch 3 - iter 88/447 - loss 0.08245773 - time (sec): 8.16 - samples/sec: 2090.39 - lr: 0.000026 - momentum: 0.000000
2023-10-17 16:49:09,017 epoch 3 - iter 132/447 - loss 0.08517300 - time (sec): 12.48 - samples/sec: 2086.23 - lr: 0.000026 - momentum: 0.000000
2023-10-17 16:49:13,016 epoch 3 - iter 176/447 - loss 0.08575584 - time (sec): 16.48 - samples/sec: 2062.81 - lr: 0.000025 - momentum: 0.000000
2023-10-17 16:49:17,520 epoch 3 - iter 220/447 - loss 0.08751252 - time (sec): 20.98 - samples/sec: 2038.05 - lr: 0.000025 - momentum: 0.000000
2023-10-17 16:49:22,077 epoch 3 - iter 264/447 - loss 0.08907774 - time (sec): 25.54 - samples/sec: 2025.88 - lr: 0.000025 - momentum: 0.000000
2023-10-17 16:49:26,176 epoch 3 - iter 308/447 - loss 0.08626226 - time (sec): 29.64 - samples/sec: 2027.47 - lr: 0.000024 - momentum: 0.000000
2023-10-17 16:49:30,200 epoch 3 - iter 352/447 - loss 0.08544823 - time (sec): 33.66 - samples/sec: 2036.17 - lr: 0.000024 - momentum: 0.000000
2023-10-17 16:49:34,477 epoch 3 - iter 396/447 - loss 0.08472916 - time (sec): 37.94 - samples/sec: 2041.16 - lr: 0.000024 - momentum: 0.000000
2023-10-17 16:49:38,389 epoch 3 - iter 440/447 - loss 0.08337869 - time (sec): 41.85 - samples/sec: 2039.72 - lr: 0.000023 - momentum: 0.000000
2023-10-17 16:49:39,011 ----------------------------------------------------------------------------------------------------
2023-10-17 16:49:39,011 EPOCH 3 done: loss 0.0831 - lr: 0.000023
2023-10-17 16:49:50,287 DEV : loss 0.16098077595233917 - f1-score (micro avg)  0.7368
2023-10-17 16:49:50,342 saving best model
2023-10-17 16:49:51,747 ----------------------------------------------------------------------------------------------------
2023-10-17 16:49:56,121 epoch 4 - iter 44/447 - loss 0.06080484 - time (sec): 4.37 - samples/sec: 2053.56 - lr: 0.000023 - momentum: 0.000000
2023-10-17 16:50:00,102 epoch 4 - iter 88/447 - loss 0.05047837 - time (sec): 8.35 - samples/sec: 2073.72 - lr: 0.000023 - momentum: 0.000000
2023-10-17 16:50:04,214 epoch 4 - iter 132/447 - loss 0.04822910 - time (sec): 12.46 - samples/sec: 2078.43 - lr: 0.000022 - momentum: 0.000000
2023-10-17 16:50:08,227 epoch 4 - iter 176/447 - loss 0.05146065 - time (sec): 16.48 - samples/sec: 2067.12 - lr: 0.000022 - momentum: 0.000000
2023-10-17 16:50:12,222 epoch 4 - iter 220/447 - loss 0.05457535 - time (sec): 20.47 - samples/sec: 2081.48 - lr: 0.000022 - momentum: 0.000000
2023-10-17 16:50:16,289 epoch 4 - iter 264/447 - loss 0.05684864 - time (sec): 24.54 - samples/sec: 2089.74 - lr: 0.000021 - momentum: 0.000000
2023-10-17 16:50:20,515 epoch 4 - iter 308/447 - loss 0.05629000 - time (sec): 28.76 - samples/sec: 2067.32 - lr: 0.000021 - momentum: 0.000000
2023-10-17 16:50:24,525 epoch 4 - iter 352/447 - loss 0.05515985 - time (sec): 32.77 - samples/sec: 2058.16 - lr: 0.000021 - momentum: 0.000000
2023-10-17 16:50:29,074 epoch 4 - iter 396/447 - loss 0.05406735 - time (sec): 37.32 - samples/sec: 2059.09 - lr: 0.000020 - momentum: 0.000000
2023-10-17 16:50:33,156 epoch 4 - iter 440/447 - loss 0.05404876 - time (sec): 41.40 - samples/sec: 2055.53 - lr: 0.000020 - momentum: 0.000000
2023-10-17 16:50:33,828 ----------------------------------------------------------------------------------------------------
2023-10-17 16:50:33,829 EPOCH 4 done: loss 0.0537 - lr: 0.000020
2023-10-17 16:50:44,774 DEV : loss 0.17118440568447113 - f1-score (micro avg)  0.7738
2023-10-17 16:50:44,823 saving best model
2023-10-17 16:50:46,179 ----------------------------------------------------------------------------------------------------
2023-10-17 16:50:50,043 epoch 5 - iter 44/447 - loss 0.02823819 - time (sec): 3.86 - samples/sec: 2125.42 - lr: 0.000020 - momentum: 0.000000
2023-10-17 16:50:54,046 epoch 5 - iter 88/447 - loss 0.02974272 - time (sec): 7.86 - samples/sec: 2178.10 - lr: 0.000019 - momentum: 0.000000
2023-10-17 16:50:58,481 epoch 5 - iter 132/447 - loss 0.03274366 - time (sec): 12.30 - samples/sec: 2173.97 - lr: 0.000019 - momentum: 0.000000
2023-10-17 16:51:02,421 epoch 5 - iter 176/447 - loss 0.03126456 - time (sec): 16.24 - samples/sec: 2120.29 - lr: 0.000019 - momentum: 0.000000
2023-10-17 16:51:06,701 epoch 5 - iter 220/447 - loss 0.02862125 - time (sec): 20.52 - samples/sec: 2123.83 - lr: 0.000018 - momentum: 0.000000
2023-10-17 16:51:10,991 epoch 5 - iter 264/447 - loss 0.02862398 - time (sec): 24.81 - samples/sec: 2103.31 - lr: 0.000018 - momentum: 0.000000
2023-10-17 16:51:15,112 epoch 5 - iter 308/447 - loss 0.02844421 - time (sec): 28.93 - samples/sec: 2083.75 - lr: 0.000018 - momentum: 0.000000
2023-10-17 16:51:19,138 epoch 5 - iter 352/447 - loss 0.02956228 - time (sec): 32.96 - samples/sec: 2071.76 - lr: 0.000017 - momentum: 0.000000
2023-10-17 16:51:23,488 epoch 5 - iter 396/447 - loss 0.03284724 - time (sec): 37.31 - samples/sec: 2064.00 - lr: 0.000017 - momentum: 0.000000
2023-10-17 16:51:27,485 epoch 5 - iter 440/447 - loss 0.03180610 - time (sec): 41.30 - samples/sec: 2062.28 - lr: 0.000017 - momentum: 0.000000
2023-10-17 16:51:28,153 ----------------------------------------------------------------------------------------------------
2023-10-17 16:51:28,154 EPOCH 5 done: loss 0.0315 - lr: 0.000017
2023-10-17 16:51:39,205 DEV : loss 0.17024928331375122 - f1-score (micro avg)  0.7777
2023-10-17 16:51:39,267 saving best model
2023-10-17 16:51:40,718 ----------------------------------------------------------------------------------------------------
2023-10-17 16:51:45,030 epoch 6 - iter 44/447 - loss 0.02203100 - time (sec): 4.31 - samples/sec: 2041.04 - lr: 0.000016 - momentum: 0.000000
2023-10-17 16:51:49,781 epoch 6 - iter 88/447 - loss 0.01920241 - time (sec): 9.06 - samples/sec: 2020.90 - lr: 0.000016 - momentum: 0.000000
2023-10-17 16:51:54,204 epoch 6 - iter 132/447 - loss 0.02226912 - time (sec): 13.48 - samples/sec: 1979.81 - lr: 0.000016 - momentum: 0.000000
2023-10-17 16:51:58,231 epoch 6 - iter 176/447 - loss 0.02378920 - time (sec): 17.51 - samples/sec: 1959.27 - lr: 0.000015 - momentum: 0.000000
2023-10-17 16:52:02,125 epoch 6 - iter 220/447 - loss 0.02333195 - time (sec): 21.40 - samples/sec: 1932.28 - lr: 0.000015 - momentum: 0.000000
2023-10-17 16:52:06,311 epoch 6 - iter 264/447 - loss 0.02310408 - time (sec): 25.59 - samples/sec: 1946.29 - lr: 0.000015 - momentum: 0.000000
2023-10-17 16:52:10,568 epoch 6 - iter 308/447 - loss 0.02234226 - time (sec): 29.85 - samples/sec: 1981.40 - lr: 0.000014 - momentum: 0.000000
2023-10-17 16:52:14,740 epoch 6 - iter 352/447 - loss 0.02289841 - time (sec): 34.02 - samples/sec: 1984.53 - lr: 0.000014 - momentum: 0.000000
2023-10-17 16:52:19,543 epoch 6 - iter 396/447 - loss 0.02189250 - time (sec): 38.82 - samples/sec: 1986.59 - lr: 0.000014 - momentum: 0.000000
2023-10-17 16:52:23,522 epoch 6 - iter 440/447 - loss 0.02187698 - time (sec): 42.80 - samples/sec: 1997.83 - lr: 0.000013 - momentum: 0.000000
2023-10-17 16:52:24,166 ----------------------------------------------------------------------------------------------------
2023-10-17 16:52:24,167 EPOCH 6 done: loss 0.0230 - lr: 0.000013
2023-10-17 16:52:34,982 DEV : loss 0.19983802735805511 - f1-score (micro avg)  0.803
2023-10-17 16:52:35,059 saving best model
2023-10-17 16:52:36,497 ----------------------------------------------------------------------------------------------------
2023-10-17 16:52:40,921 epoch 7 - iter 44/447 - loss 0.00850187 - time (sec): 4.42 - samples/sec: 2092.53 - lr: 0.000013 - momentum: 0.000000
2023-10-17 16:52:45,097 epoch 7 - iter 88/447 - loss 0.01184907 - time (sec): 8.60 - samples/sec: 2003.34 - lr: 0.000013 - momentum: 0.000000
2023-10-17 16:52:49,559 epoch 7 - iter 132/447 - loss 0.01008429 - time (sec): 13.06 - samples/sec: 1983.15 - lr: 0.000012 - momentum: 0.000000
2023-10-17 16:52:53,577 epoch 7 - iter 176/447 - loss 0.01116324 - time (sec): 17.08 - samples/sec: 1978.81 - lr: 0.000012 - momentum: 0.000000
2023-10-17 16:52:57,658 epoch 7 - iter 220/447 - loss 0.01311415 - time (sec): 21.16 - samples/sec: 1975.15 - lr: 0.000012 - momentum: 0.000000
2023-10-17 16:53:01,902 epoch 7 - iter 264/447 - loss 0.01472328 - time (sec): 25.40 - samples/sec: 1967.29 - lr: 0.000011 - momentum: 0.000000
2023-10-17 16:53:06,100 epoch 7 - iter 308/447 - loss 0.01385064 - time (sec): 29.60 - samples/sec: 1977.81 - lr: 0.000011 - momentum: 0.000000
2023-10-17 16:53:10,128 epoch 7 - iter 352/447 - loss 0.01331558 - time (sec): 33.63 - samples/sec: 1999.23 - lr: 0.000011 - momentum: 0.000000
2023-10-17 16:53:14,178 epoch 7 - iter 396/447 - loss 0.01397777 - time (sec): 37.68 - samples/sec: 2010.39 - lr: 0.000010 - momentum: 0.000000
2023-10-17 16:53:18,463 epoch 7 - iter 440/447 - loss 0.01368698 - time (sec): 41.96 - samples/sec: 2027.17 - lr: 0.000010 - momentum: 0.000000
2023-10-17 16:53:19,188 ----------------------------------------------------------------------------------------------------
2023-10-17 16:53:19,188 EPOCH 7 done: loss 0.0135 - lr: 0.000010
2023-10-17 16:53:30,279 DEV : loss 0.20469656586647034 - f1-score (micro avg)  0.7921
2023-10-17 16:53:30,335 ----------------------------------------------------------------------------------------------------
2023-10-17 16:53:34,480 epoch 8 - iter 44/447 - loss 0.00313666 - time (sec): 4.14 - samples/sec: 2080.39 - lr: 0.000010 - momentum: 0.000000
2023-10-17 16:53:38,581 epoch 8 - iter 88/447 - loss 0.00502648 - time (sec): 8.24 - samples/sec: 2049.56 - lr: 0.000009 - momentum: 0.000000
2023-10-17 16:53:42,753 epoch 8 - iter 132/447 - loss 0.00740875 - time (sec): 12.42 - samples/sec: 2062.49 - lr: 0.000009 - momentum: 0.000000
2023-10-17 16:53:47,062 epoch 8 - iter 176/447 - loss 0.00859745 - time (sec): 16.73 - samples/sec: 2020.65 - lr: 0.000009 - momentum: 0.000000
2023-10-17 16:53:51,256 epoch 8 - iter 220/447 - loss 0.00798549 - time (sec): 20.92 - samples/sec: 2006.40 - lr: 0.000008 - momentum: 0.000000
2023-10-17 16:53:55,572 epoch 8 - iter 264/447 - loss 0.00781959 - time (sec): 25.23 - samples/sec: 2015.40 - lr: 0.000008 - momentum: 0.000000
2023-10-17 16:53:59,928 epoch 8 - iter 308/447 - loss 0.00794785 - time (sec): 29.59 - samples/sec: 2008.36 - lr: 0.000008 - momentum: 0.000000
2023-10-17 16:54:04,317 epoch 8 - iter 352/447 - loss 0.00872736 - time (sec): 33.98 - samples/sec: 1991.41 - lr: 0.000007 - momentum: 0.000000
2023-10-17 16:54:08,671 epoch 8 - iter 396/447 - loss 0.00928899 - time (sec): 38.33 - samples/sec: 1991.38 - lr: 0.000007 - momentum: 0.000000
2023-10-17 16:54:13,050 epoch 8 - iter 440/447 - loss 0.00882367 - time (sec): 42.71 - samples/sec: 1995.01 - lr: 0.000007 - momentum: 0.000000
2023-10-17 16:54:13,689 ----------------------------------------------------------------------------------------------------
2023-10-17 16:54:13,689 EPOCH 8 done: loss 0.0088 - lr: 0.000007
2023-10-17 16:54:25,266 DEV : loss 0.21882730722427368 - f1-score (micro avg)  0.8018
2023-10-17 16:54:25,341 ----------------------------------------------------------------------------------------------------
2023-10-17 16:54:29,957 epoch 9 - iter 44/447 - loss 0.00911349 - time (sec): 4.61 - samples/sec: 1939.98 - lr: 0.000006 - momentum: 0.000000
2023-10-17 16:54:34,629 epoch 9 - iter 88/447 - loss 0.00622748 - time (sec): 9.28 - samples/sec: 2044.19 - lr: 0.000006 - momentum: 0.000000
2023-10-17 16:54:38,713 epoch 9 - iter 132/447 - loss 0.00477911 - time (sec): 13.37 - samples/sec: 2033.90 - lr: 0.000006 - momentum: 0.000000
2023-10-17 16:54:43,037 epoch 9 - iter 176/447 - loss 0.00545796 - time (sec): 17.69 - samples/sec: 1994.09 - lr: 0.000005 - momentum: 0.000000
2023-10-17 16:54:47,439 epoch 9 - iter 220/447 - loss 0.00514891 - time (sec): 22.09 - samples/sec: 2000.95 - lr: 0.000005 - momentum: 0.000000
2023-10-17 16:54:51,898 epoch 9 - iter 264/447 - loss 0.00530463 - time (sec): 26.55 - samples/sec: 1997.43 - lr: 0.000005 - momentum: 0.000000
2023-10-17 16:54:56,171 epoch 9 - iter 308/447 - loss 0.00519452 - time (sec): 30.83 - samples/sec: 1989.09 - lr: 0.000004 - momentum: 0.000000
2023-10-17 16:55:00,230 epoch 9 - iter 352/447 - loss 0.00567824 - time (sec): 34.89 - samples/sec: 1984.82 - lr: 0.000004 - momentum: 0.000000
2023-10-17 16:55:04,392 epoch 9 - iter 396/447 - loss 0.00625861 - time (sec): 39.05 - samples/sec: 1978.66 - lr: 0.000004 - momentum: 0.000000
2023-10-17 16:55:08,577 epoch 9 - iter 440/447 - loss 0.00656770 - time (sec): 43.23 - samples/sec: 1975.09 - lr: 0.000003 - momentum: 0.000000
2023-10-17 16:55:09,189 ----------------------------------------------------------------------------------------------------
2023-10-17 16:55:09,190 EPOCH 9 done: loss 0.0065 - lr: 0.000003
2023-10-17 16:55:20,752 DEV : loss 0.22874712944030762 - f1-score (micro avg)  0.8066
2023-10-17 16:55:20,817 saving best model
2023-10-17 16:55:22,230 ----------------------------------------------------------------------------------------------------
2023-10-17 16:55:26,626 epoch 10 - iter 44/447 - loss 0.00173260 - time (sec): 4.39 - samples/sec: 1946.13 - lr: 0.000003 - momentum: 0.000000
2023-10-17 16:55:30,957 epoch 10 - iter 88/447 - loss 0.00259927 - time (sec): 8.72 - samples/sec: 1922.78 - lr: 0.000003 - momentum: 0.000000
2023-10-17 16:55:34,844 epoch 10 - iter 132/447 - loss 0.00371022 - time (sec): 12.61 - samples/sec: 1951.89 - lr: 0.000002 - momentum: 0.000000
2023-10-17 16:55:39,216 epoch 10 - iter 176/447 - loss 0.00343244 - time (sec): 16.98 - samples/sec: 1973.14 - lr: 0.000002 - momentum: 0.000000
2023-10-17 16:55:43,550 epoch 10 - iter 220/447 - loss 0.00410654 - time (sec): 21.31 - samples/sec: 1982.22 - lr: 0.000002 - momentum: 0.000000
2023-10-17 16:55:48,037 epoch 10 - iter 264/447 - loss 0.00468622 - time (sec): 25.80 - samples/sec: 1992.41 - lr: 0.000001 - momentum: 0.000000
2023-10-17 16:55:51,992 epoch 10 - iter 308/447 - loss 0.00492962 - time (sec): 29.76 - samples/sec: 1994.33 - lr: 0.000001 - momentum: 0.000000
2023-10-17 16:55:56,019 epoch 10 - iter 352/447 - loss 0.00498152 - time (sec): 33.78 - samples/sec: 2022.40 - lr: 0.000001 - momentum: 0.000000
2023-10-17 16:55:59,943 epoch 10 - iter 396/447 - loss 0.00514324 - time (sec): 37.71 - samples/sec: 2033.47 - lr: 0.000000 - momentum: 0.000000
2023-10-17 16:56:04,083 epoch 10 - iter 440/447 - loss 0.00486802 - time (sec): 41.85 - samples/sec: 2035.70 - lr: 0.000000 - momentum: 0.000000
2023-10-17 16:56:04,730 ----------------------------------------------------------------------------------------------------
2023-10-17 16:56:04,731 EPOCH 10 done: loss 0.0048 - lr: 0.000000
2023-10-17 16:56:15,689 DEV : loss 0.23447194695472717 - f1-score (micro avg)  0.805
2023-10-17 16:56:16,295 ----------------------------------------------------------------------------------------------------
2023-10-17 16:56:16,298 Loading model from best epoch ...
2023-10-17 16:56:19,028 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-prod, B-prod, E-prod, I-prod, S-time, B-time, E-time, I-time
2023-10-17 16:56:25,007 
Results:
- F-score (micro) 0.7627
- F-score (macro) 0.6747
- Accuracy 0.6391

By class:
              precision    recall  f1-score   support

         loc     0.8617    0.8574    0.8595       596
        pers     0.7067    0.7958    0.7486       333
         org     0.4667    0.5833    0.5185       132
        prod     0.5965    0.5152    0.5528        66
        time     0.6939    0.6939    0.6939        49

   micro avg     0.7433    0.7832    0.7627      1176
   macro avg     0.6651    0.6891    0.6747      1176
weighted avg     0.7516    0.7832    0.7657      1176

2023-10-17 16:56:25,007 ----------------------------------------------------------------------------------------------------