File size: 23,892 Bytes
d828b44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
2023-10-13 23:38:56,260 ----------------------------------------------------------------------------------------------------
2023-10-13 23:38:56,261 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-13 23:38:56,261 ----------------------------------------------------------------------------------------------------
2023-10-13 23:38:56,261 MultiCorpus: 7936 train + 992 dev + 992 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr
2023-10-13 23:38:56,261 ----------------------------------------------------------------------------------------------------
2023-10-13 23:38:56,261 Train:  7936 sentences
2023-10-13 23:38:56,261         (train_with_dev=False, train_with_test=False)
2023-10-13 23:38:56,261 ----------------------------------------------------------------------------------------------------
2023-10-13 23:38:56,261 Training Params:
2023-10-13 23:38:56,261  - learning_rate: "3e-05" 
2023-10-13 23:38:56,261  - mini_batch_size: "4"
2023-10-13 23:38:56,261  - max_epochs: "10"
2023-10-13 23:38:56,261  - shuffle: "True"
2023-10-13 23:38:56,261 ----------------------------------------------------------------------------------------------------
2023-10-13 23:38:56,261 Plugins:
2023-10-13 23:38:56,261  - LinearScheduler | warmup_fraction: '0.1'
2023-10-13 23:38:56,261 ----------------------------------------------------------------------------------------------------
2023-10-13 23:38:56,261 Final evaluation on model from best epoch (best-model.pt)
2023-10-13 23:38:56,261  - metric: "('micro avg', 'f1-score')"
2023-10-13 23:38:56,262 ----------------------------------------------------------------------------------------------------
2023-10-13 23:38:56,262 Computation:
2023-10-13 23:38:56,262  - compute on device: cuda:0
2023-10-13 23:38:56,262  - embedding storage: none
2023-10-13 23:38:56,262 ----------------------------------------------------------------------------------------------------
2023-10-13 23:38:56,262 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-4"
2023-10-13 23:38:56,262 ----------------------------------------------------------------------------------------------------
2023-10-13 23:38:56,262 ----------------------------------------------------------------------------------------------------
2023-10-13 23:39:05,623 epoch 1 - iter 198/1984 - loss 1.83762815 - time (sec): 9.36 - samples/sec: 1740.22 - lr: 0.000003 - momentum: 0.000000
2023-10-13 23:39:14,861 epoch 1 - iter 396/1984 - loss 1.09662191 - time (sec): 18.60 - samples/sec: 1741.17 - lr: 0.000006 - momentum: 0.000000
2023-10-13 23:39:23,842 epoch 1 - iter 594/1984 - loss 0.81504132 - time (sec): 27.58 - samples/sec: 1746.82 - lr: 0.000009 - momentum: 0.000000
2023-10-13 23:39:32,828 epoch 1 - iter 792/1984 - loss 0.66249214 - time (sec): 36.57 - samples/sec: 1765.29 - lr: 0.000012 - momentum: 0.000000
2023-10-13 23:39:41,985 epoch 1 - iter 990/1984 - loss 0.56311956 - time (sec): 45.72 - samples/sec: 1779.23 - lr: 0.000015 - momentum: 0.000000
2023-10-13 23:39:51,226 epoch 1 - iter 1188/1984 - loss 0.48540145 - time (sec): 54.96 - samples/sec: 1808.89 - lr: 0.000018 - momentum: 0.000000
2023-10-13 23:40:00,329 epoch 1 - iter 1386/1984 - loss 0.44028878 - time (sec): 64.07 - samples/sec: 1802.31 - lr: 0.000021 - momentum: 0.000000
2023-10-13 23:40:09,323 epoch 1 - iter 1584/1984 - loss 0.40258648 - time (sec): 73.06 - samples/sec: 1804.47 - lr: 0.000024 - momentum: 0.000000
2023-10-13 23:40:18,196 epoch 1 - iter 1782/1984 - loss 0.37540297 - time (sec): 81.93 - samples/sec: 1799.90 - lr: 0.000027 - momentum: 0.000000
2023-10-13 23:40:27,100 epoch 1 - iter 1980/1984 - loss 0.35246436 - time (sec): 90.84 - samples/sec: 1801.35 - lr: 0.000030 - momentum: 0.000000
2023-10-13 23:40:27,278 ----------------------------------------------------------------------------------------------------
2023-10-13 23:40:27,278 EPOCH 1 done: loss 0.3524 - lr: 0.000030
2023-10-13 23:40:30,422 DEV : loss 0.12433891743421555 - f1-score (micro avg)  0.7099
2023-10-13 23:40:30,444 saving best model
2023-10-13 23:40:30,861 ----------------------------------------------------------------------------------------------------
2023-10-13 23:40:39,812 epoch 2 - iter 198/1984 - loss 0.11828575 - time (sec): 8.95 - samples/sec: 1789.32 - lr: 0.000030 - momentum: 0.000000
2023-10-13 23:40:48,784 epoch 2 - iter 396/1984 - loss 0.11140177 - time (sec): 17.92 - samples/sec: 1815.27 - lr: 0.000029 - momentum: 0.000000
2023-10-13 23:40:58,222 epoch 2 - iter 594/1984 - loss 0.11940888 - time (sec): 27.36 - samples/sec: 1789.15 - lr: 0.000029 - momentum: 0.000000
2023-10-13 23:41:07,249 epoch 2 - iter 792/1984 - loss 0.11906737 - time (sec): 36.39 - samples/sec: 1799.64 - lr: 0.000029 - momentum: 0.000000
2023-10-13 23:41:16,257 epoch 2 - iter 990/1984 - loss 0.11891276 - time (sec): 45.39 - samples/sec: 1804.73 - lr: 0.000028 - momentum: 0.000000
2023-10-13 23:41:25,219 epoch 2 - iter 1188/1984 - loss 0.11640076 - time (sec): 54.36 - samples/sec: 1812.33 - lr: 0.000028 - momentum: 0.000000
2023-10-13 23:41:34,108 epoch 2 - iter 1386/1984 - loss 0.11635323 - time (sec): 63.25 - samples/sec: 1815.08 - lr: 0.000028 - momentum: 0.000000
2023-10-13 23:41:43,049 epoch 2 - iter 1584/1984 - loss 0.11619891 - time (sec): 72.19 - samples/sec: 1814.34 - lr: 0.000027 - momentum: 0.000000
2023-10-13 23:41:52,022 epoch 2 - iter 1782/1984 - loss 0.11338861 - time (sec): 81.16 - samples/sec: 1818.54 - lr: 0.000027 - momentum: 0.000000
2023-10-13 23:42:01,193 epoch 2 - iter 1980/1984 - loss 0.11087439 - time (sec): 90.33 - samples/sec: 1812.59 - lr: 0.000027 - momentum: 0.000000
2023-10-13 23:42:01,372 ----------------------------------------------------------------------------------------------------
2023-10-13 23:42:01,372 EPOCH 2 done: loss 0.1110 - lr: 0.000027
2023-10-13 23:42:05,217 DEV : loss 0.09629001468420029 - f1-score (micro avg)  0.7285
2023-10-13 23:42:05,237 saving best model
2023-10-13 23:42:05,739 ----------------------------------------------------------------------------------------------------
2023-10-13 23:42:15,161 epoch 3 - iter 198/1984 - loss 0.07429461 - time (sec): 9.42 - samples/sec: 1699.53 - lr: 0.000026 - momentum: 0.000000
2023-10-13 23:42:24,281 epoch 3 - iter 396/1984 - loss 0.08110074 - time (sec): 18.54 - samples/sec: 1720.44 - lr: 0.000026 - momentum: 0.000000
2023-10-13 23:42:33,319 epoch 3 - iter 594/1984 - loss 0.08477110 - time (sec): 27.58 - samples/sec: 1770.13 - lr: 0.000026 - momentum: 0.000000
2023-10-13 23:42:42,426 epoch 3 - iter 792/1984 - loss 0.08227969 - time (sec): 36.68 - samples/sec: 1798.59 - lr: 0.000025 - momentum: 0.000000
2023-10-13 23:42:51,384 epoch 3 - iter 990/1984 - loss 0.08206052 - time (sec): 45.64 - samples/sec: 1800.84 - lr: 0.000025 - momentum: 0.000000
2023-10-13 23:43:00,378 epoch 3 - iter 1188/1984 - loss 0.08348155 - time (sec): 54.63 - samples/sec: 1794.26 - lr: 0.000025 - momentum: 0.000000
2023-10-13 23:43:09,338 epoch 3 - iter 1386/1984 - loss 0.08114266 - time (sec): 63.60 - samples/sec: 1796.94 - lr: 0.000024 - momentum: 0.000000
2023-10-13 23:43:18,462 epoch 3 - iter 1584/1984 - loss 0.07984718 - time (sec): 72.72 - samples/sec: 1803.30 - lr: 0.000024 - momentum: 0.000000
2023-10-13 23:43:27,624 epoch 3 - iter 1782/1984 - loss 0.07934502 - time (sec): 81.88 - samples/sec: 1801.00 - lr: 0.000024 - momentum: 0.000000
2023-10-13 23:43:36,624 epoch 3 - iter 1980/1984 - loss 0.07959415 - time (sec): 90.88 - samples/sec: 1802.37 - lr: 0.000023 - momentum: 0.000000
2023-10-13 23:43:36,802 ----------------------------------------------------------------------------------------------------
2023-10-13 23:43:36,802 EPOCH 3 done: loss 0.0797 - lr: 0.000023
2023-10-13 23:43:40,249 DEV : loss 0.11717832088470459 - f1-score (micro avg)  0.7546
2023-10-13 23:43:40,270 saving best model
2023-10-13 23:43:40,823 ----------------------------------------------------------------------------------------------------
2023-10-13 23:43:50,001 epoch 4 - iter 198/1984 - loss 0.06052117 - time (sec): 9.18 - samples/sec: 1734.76 - lr: 0.000023 - momentum: 0.000000
2023-10-13 23:43:59,043 epoch 4 - iter 396/1984 - loss 0.06140378 - time (sec): 18.22 - samples/sec: 1796.43 - lr: 0.000023 - momentum: 0.000000
2023-10-13 23:44:08,039 epoch 4 - iter 594/1984 - loss 0.06052478 - time (sec): 27.21 - samples/sec: 1754.30 - lr: 0.000022 - momentum: 0.000000
2023-10-13 23:44:17,139 epoch 4 - iter 792/1984 - loss 0.06090929 - time (sec): 36.31 - samples/sec: 1772.51 - lr: 0.000022 - momentum: 0.000000
2023-10-13 23:44:26,199 epoch 4 - iter 990/1984 - loss 0.05940225 - time (sec): 45.37 - samples/sec: 1784.17 - lr: 0.000022 - momentum: 0.000000
2023-10-13 23:44:35,410 epoch 4 - iter 1188/1984 - loss 0.06144572 - time (sec): 54.59 - samples/sec: 1792.61 - lr: 0.000021 - momentum: 0.000000
2023-10-13 23:44:44,471 epoch 4 - iter 1386/1984 - loss 0.06104930 - time (sec): 63.65 - samples/sec: 1790.11 - lr: 0.000021 - momentum: 0.000000
2023-10-13 23:44:53,448 epoch 4 - iter 1584/1984 - loss 0.06021919 - time (sec): 72.62 - samples/sec: 1787.03 - lr: 0.000021 - momentum: 0.000000
2023-10-13 23:45:02,466 epoch 4 - iter 1782/1984 - loss 0.06016900 - time (sec): 81.64 - samples/sec: 1792.86 - lr: 0.000020 - momentum: 0.000000
2023-10-13 23:45:11,679 epoch 4 - iter 1980/1984 - loss 0.05925545 - time (sec): 90.85 - samples/sec: 1801.73 - lr: 0.000020 - momentum: 0.000000
2023-10-13 23:45:11,872 ----------------------------------------------------------------------------------------------------
2023-10-13 23:45:11,872 EPOCH 4 done: loss 0.0592 - lr: 0.000020
2023-10-13 23:45:15,405 DEV : loss 0.14380605518817902 - f1-score (micro avg)  0.7823
2023-10-13 23:45:15,440 saving best model
2023-10-13 23:45:15,946 ----------------------------------------------------------------------------------------------------
2023-10-13 23:45:25,143 epoch 5 - iter 198/1984 - loss 0.04087458 - time (sec): 9.19 - samples/sec: 1756.29 - lr: 0.000020 - momentum: 0.000000
2023-10-13 23:45:34,337 epoch 5 - iter 396/1984 - loss 0.04482892 - time (sec): 18.39 - samples/sec: 1786.25 - lr: 0.000019 - momentum: 0.000000
2023-10-13 23:45:43,527 epoch 5 - iter 594/1984 - loss 0.04311401 - time (sec): 27.58 - samples/sec: 1811.77 - lr: 0.000019 - momentum: 0.000000
2023-10-13 23:45:52,597 epoch 5 - iter 792/1984 - loss 0.04281423 - time (sec): 36.65 - samples/sec: 1791.80 - lr: 0.000019 - momentum: 0.000000
2023-10-13 23:46:01,560 epoch 5 - iter 990/1984 - loss 0.04332734 - time (sec): 45.61 - samples/sec: 1786.28 - lr: 0.000018 - momentum: 0.000000
2023-10-13 23:46:10,615 epoch 5 - iter 1188/1984 - loss 0.04358208 - time (sec): 54.66 - samples/sec: 1794.86 - lr: 0.000018 - momentum: 0.000000
2023-10-13 23:46:19,677 epoch 5 - iter 1386/1984 - loss 0.04379144 - time (sec): 63.73 - samples/sec: 1803.16 - lr: 0.000018 - momentum: 0.000000
2023-10-13 23:46:28,892 epoch 5 - iter 1584/1984 - loss 0.04545313 - time (sec): 72.94 - samples/sec: 1811.13 - lr: 0.000017 - momentum: 0.000000
2023-10-13 23:46:37,793 epoch 5 - iter 1782/1984 - loss 0.04375927 - time (sec): 81.84 - samples/sec: 1807.89 - lr: 0.000017 - momentum: 0.000000
2023-10-13 23:46:46,890 epoch 5 - iter 1980/1984 - loss 0.04461810 - time (sec): 90.94 - samples/sec: 1798.77 - lr: 0.000017 - momentum: 0.000000
2023-10-13 23:46:47,084 ----------------------------------------------------------------------------------------------------
2023-10-13 23:46:47,084 EPOCH 5 done: loss 0.0446 - lr: 0.000017
2023-10-13 23:46:51,086 DEV : loss 0.16696855425834656 - f1-score (micro avg)  0.767
2023-10-13 23:46:51,110 ----------------------------------------------------------------------------------------------------
2023-10-13 23:47:00,477 epoch 6 - iter 198/1984 - loss 0.03505539 - time (sec): 9.37 - samples/sec: 1863.68 - lr: 0.000016 - momentum: 0.000000
2023-10-13 23:47:09,450 epoch 6 - iter 396/1984 - loss 0.03251886 - time (sec): 18.34 - samples/sec: 1813.90 - lr: 0.000016 - momentum: 0.000000
2023-10-13 23:47:18,382 epoch 6 - iter 594/1984 - loss 0.03280510 - time (sec): 27.27 - samples/sec: 1791.04 - lr: 0.000016 - momentum: 0.000000
2023-10-13 23:47:27,447 epoch 6 - iter 792/1984 - loss 0.03472134 - time (sec): 36.34 - samples/sec: 1798.71 - lr: 0.000015 - momentum: 0.000000
2023-10-13 23:47:36,364 epoch 6 - iter 990/1984 - loss 0.03368279 - time (sec): 45.25 - samples/sec: 1793.01 - lr: 0.000015 - momentum: 0.000000
2023-10-13 23:47:45,255 epoch 6 - iter 1188/1984 - loss 0.03396626 - time (sec): 54.14 - samples/sec: 1792.49 - lr: 0.000015 - momentum: 0.000000
2023-10-13 23:47:54,494 epoch 6 - iter 1386/1984 - loss 0.03365918 - time (sec): 63.38 - samples/sec: 1800.85 - lr: 0.000014 - momentum: 0.000000
2023-10-13 23:48:03,472 epoch 6 - iter 1584/1984 - loss 0.03440385 - time (sec): 72.36 - samples/sec: 1803.11 - lr: 0.000014 - momentum: 0.000000
2023-10-13 23:48:12,439 epoch 6 - iter 1782/1984 - loss 0.03479304 - time (sec): 81.33 - samples/sec: 1808.13 - lr: 0.000014 - momentum: 0.000000
2023-10-13 23:48:21,422 epoch 6 - iter 1980/1984 - loss 0.03544213 - time (sec): 90.31 - samples/sec: 1813.02 - lr: 0.000013 - momentum: 0.000000
2023-10-13 23:48:21,597 ----------------------------------------------------------------------------------------------------
2023-10-13 23:48:21,598 EPOCH 6 done: loss 0.0354 - lr: 0.000013
2023-10-13 23:48:24,988 DEV : loss 0.1906966120004654 - f1-score (micro avg)  0.7609
2023-10-13 23:48:25,009 ----------------------------------------------------------------------------------------------------
2023-10-13 23:48:34,032 epoch 7 - iter 198/1984 - loss 0.02239552 - time (sec): 9.02 - samples/sec: 1854.40 - lr: 0.000013 - momentum: 0.000000
2023-10-13 23:48:42,982 epoch 7 - iter 396/1984 - loss 0.01908052 - time (sec): 17.97 - samples/sec: 1843.91 - lr: 0.000013 - momentum: 0.000000
2023-10-13 23:48:52,026 epoch 7 - iter 594/1984 - loss 0.01846812 - time (sec): 27.02 - samples/sec: 1843.30 - lr: 0.000012 - momentum: 0.000000
2023-10-13 23:49:01,113 epoch 7 - iter 792/1984 - loss 0.02010717 - time (sec): 36.10 - samples/sec: 1803.01 - lr: 0.000012 - momentum: 0.000000
2023-10-13 23:49:10,095 epoch 7 - iter 990/1984 - loss 0.02248072 - time (sec): 45.08 - samples/sec: 1819.55 - lr: 0.000012 - momentum: 0.000000
2023-10-13 23:49:18,978 epoch 7 - iter 1188/1984 - loss 0.02304749 - time (sec): 53.97 - samples/sec: 1817.48 - lr: 0.000011 - momentum: 0.000000
2023-10-13 23:49:27,935 epoch 7 - iter 1386/1984 - loss 0.02228094 - time (sec): 62.93 - samples/sec: 1812.85 - lr: 0.000011 - momentum: 0.000000
2023-10-13 23:49:36,927 epoch 7 - iter 1584/1984 - loss 0.02350682 - time (sec): 71.92 - samples/sec: 1811.32 - lr: 0.000011 - momentum: 0.000000
2023-10-13 23:49:46,117 epoch 7 - iter 1782/1984 - loss 0.02341870 - time (sec): 81.11 - samples/sec: 1810.88 - lr: 0.000010 - momentum: 0.000000
2023-10-13 23:49:55,199 epoch 7 - iter 1980/1984 - loss 0.02353248 - time (sec): 90.19 - samples/sec: 1815.61 - lr: 0.000010 - momentum: 0.000000
2023-10-13 23:49:55,377 ----------------------------------------------------------------------------------------------------
2023-10-13 23:49:55,377 EPOCH 7 done: loss 0.0235 - lr: 0.000010
2023-10-13 23:49:59,310 DEV : loss 0.19835640490055084 - f1-score (micro avg)  0.782
2023-10-13 23:49:59,331 ----------------------------------------------------------------------------------------------------
2023-10-13 23:50:08,784 epoch 8 - iter 198/1984 - loss 0.02230710 - time (sec): 9.45 - samples/sec: 1803.36 - lr: 0.000010 - momentum: 0.000000
2023-10-13 23:50:17,765 epoch 8 - iter 396/1984 - loss 0.01940484 - time (sec): 18.43 - samples/sec: 1809.43 - lr: 0.000009 - momentum: 0.000000
2023-10-13 23:50:26,817 epoch 8 - iter 594/1984 - loss 0.01764565 - time (sec): 27.48 - samples/sec: 1836.45 - lr: 0.000009 - momentum: 0.000000
2023-10-13 23:50:35,737 epoch 8 - iter 792/1984 - loss 0.01686537 - time (sec): 36.40 - samples/sec: 1833.82 - lr: 0.000009 - momentum: 0.000000
2023-10-13 23:50:44,795 epoch 8 - iter 990/1984 - loss 0.01829213 - time (sec): 45.46 - samples/sec: 1805.63 - lr: 0.000008 - momentum: 0.000000
2023-10-13 23:50:53,724 epoch 8 - iter 1188/1984 - loss 0.01820113 - time (sec): 54.39 - samples/sec: 1809.96 - lr: 0.000008 - momentum: 0.000000
2023-10-13 23:51:02,869 epoch 8 - iter 1386/1984 - loss 0.01748894 - time (sec): 63.54 - samples/sec: 1807.49 - lr: 0.000008 - momentum: 0.000000
2023-10-13 23:51:12,333 epoch 8 - iter 1584/1984 - loss 0.01719619 - time (sec): 73.00 - samples/sec: 1800.67 - lr: 0.000007 - momentum: 0.000000
2023-10-13 23:51:21,277 epoch 8 - iter 1782/1984 - loss 0.01734332 - time (sec): 81.94 - samples/sec: 1807.45 - lr: 0.000007 - momentum: 0.000000
2023-10-13 23:51:30,188 epoch 8 - iter 1980/1984 - loss 0.01756418 - time (sec): 90.86 - samples/sec: 1801.14 - lr: 0.000007 - momentum: 0.000000
2023-10-13 23:51:30,370 ----------------------------------------------------------------------------------------------------
2023-10-13 23:51:30,370 EPOCH 8 done: loss 0.0175 - lr: 0.000007
2023-10-13 23:51:33,748 DEV : loss 0.215502068400383 - f1-score (micro avg)  0.7633
2023-10-13 23:51:33,769 ----------------------------------------------------------------------------------------------------
2023-10-13 23:51:42,760 epoch 9 - iter 198/1984 - loss 0.01149768 - time (sec): 8.99 - samples/sec: 1769.82 - lr: 0.000006 - momentum: 0.000000
2023-10-13 23:51:51,819 epoch 9 - iter 396/1984 - loss 0.01515882 - time (sec): 18.05 - samples/sec: 1787.98 - lr: 0.000006 - momentum: 0.000000
2023-10-13 23:52:00,844 epoch 9 - iter 594/1984 - loss 0.01337974 - time (sec): 27.07 - samples/sec: 1824.64 - lr: 0.000006 - momentum: 0.000000
2023-10-13 23:52:09,839 epoch 9 - iter 792/1984 - loss 0.01205560 - time (sec): 36.07 - samples/sec: 1828.37 - lr: 0.000005 - momentum: 0.000000
2023-10-13 23:52:18,833 epoch 9 - iter 990/1984 - loss 0.01052937 - time (sec): 45.06 - samples/sec: 1821.80 - lr: 0.000005 - momentum: 0.000000
2023-10-13 23:52:27,846 epoch 9 - iter 1188/1984 - loss 0.01105230 - time (sec): 54.08 - samples/sec: 1822.13 - lr: 0.000005 - momentum: 0.000000
2023-10-13 23:52:36,854 epoch 9 - iter 1386/1984 - loss 0.01084377 - time (sec): 63.08 - samples/sec: 1815.25 - lr: 0.000004 - momentum: 0.000000
2023-10-13 23:52:45,813 epoch 9 - iter 1584/1984 - loss 0.01116629 - time (sec): 72.04 - samples/sec: 1817.09 - lr: 0.000004 - momentum: 0.000000
2023-10-13 23:52:54,838 epoch 9 - iter 1782/1984 - loss 0.01094512 - time (sec): 81.07 - samples/sec: 1812.73 - lr: 0.000004 - momentum: 0.000000
2023-10-13 23:53:03,779 epoch 9 - iter 1980/1984 - loss 0.01048321 - time (sec): 90.01 - samples/sec: 1818.01 - lr: 0.000003 - momentum: 0.000000
2023-10-13 23:53:03,957 ----------------------------------------------------------------------------------------------------
2023-10-13 23:53:03,957 EPOCH 9 done: loss 0.0105 - lr: 0.000003
2023-10-13 23:53:07,359 DEV : loss 0.2330678254365921 - f1-score (micro avg)  0.7632
2023-10-13 23:53:07,381 ----------------------------------------------------------------------------------------------------
2023-10-13 23:53:16,538 epoch 10 - iter 198/1984 - loss 0.00727112 - time (sec): 9.16 - samples/sec: 1873.57 - lr: 0.000003 - momentum: 0.000000
2023-10-13 23:53:25,529 epoch 10 - iter 396/1984 - loss 0.00644925 - time (sec): 18.15 - samples/sec: 1829.60 - lr: 0.000003 - momentum: 0.000000
2023-10-13 23:53:34,573 epoch 10 - iter 594/1984 - loss 0.00644003 - time (sec): 27.19 - samples/sec: 1831.35 - lr: 0.000002 - momentum: 0.000000
2023-10-13 23:53:44,140 epoch 10 - iter 792/1984 - loss 0.00614650 - time (sec): 36.76 - samples/sec: 1815.49 - lr: 0.000002 - momentum: 0.000000
2023-10-13 23:53:53,201 epoch 10 - iter 990/1984 - loss 0.00665805 - time (sec): 45.82 - samples/sec: 1833.66 - lr: 0.000002 - momentum: 0.000000
2023-10-13 23:54:02,072 epoch 10 - iter 1188/1984 - loss 0.00708536 - time (sec): 54.69 - samples/sec: 1829.89 - lr: 0.000001 - momentum: 0.000000
2023-10-13 23:54:10,915 epoch 10 - iter 1386/1984 - loss 0.00771984 - time (sec): 63.53 - samples/sec: 1817.53 - lr: 0.000001 - momentum: 0.000000
2023-10-13 23:54:19,803 epoch 10 - iter 1584/1984 - loss 0.00781777 - time (sec): 72.42 - samples/sec: 1807.47 - lr: 0.000001 - momentum: 0.000000
2023-10-13 23:54:28,689 epoch 10 - iter 1782/1984 - loss 0.00737069 - time (sec): 81.31 - samples/sec: 1806.15 - lr: 0.000000 - momentum: 0.000000
2023-10-13 23:54:38,294 epoch 10 - iter 1980/1984 - loss 0.00694992 - time (sec): 90.91 - samples/sec: 1800.46 - lr: 0.000000 - momentum: 0.000000
2023-10-13 23:54:38,470 ----------------------------------------------------------------------------------------------------
2023-10-13 23:54:38,470 EPOCH 10 done: loss 0.0069 - lr: 0.000000
2023-10-13 23:54:41,942 DEV : loss 0.23772144317626953 - f1-score (micro avg)  0.7709
2023-10-13 23:54:42,395 ----------------------------------------------------------------------------------------------------
2023-10-13 23:54:42,396 Loading model from best epoch ...
2023-10-13 23:54:43,833 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-13 23:54:47,114 
Results:
- F-score (micro) 0.7789
- F-score (macro) 0.6828
- Accuracy 0.6588

By class:
              precision    recall  f1-score   support

         LOC     0.8215    0.8641    0.8423       655
         PER     0.6963    0.8430    0.7627       223
         ORG     0.4951    0.4016    0.4435       127

   micro avg     0.7580    0.8010    0.7789      1005
   macro avg     0.6710    0.7029    0.6828      1005
weighted avg     0.7525    0.8010    0.7742      1005

2023-10-13 23:54:47,115 ----------------------------------------------------------------------------------------------------