stefan-it's picture
Upload ./training.log with huggingface_hub
09705db
2023-10-24 10:02:59,640 ----------------------------------------------------------------------------------------------------
2023-10-24 10:02:59,641 Model: "SequenceTagger(
(embeddings): TransformerWordEmbeddings(
(model): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(64001, 768)
(position_embeddings): Embedding(512, 768)
(token_type_embeddings): Embedding(2, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(1): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(2): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(3): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(4): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(5): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(6): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(7): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(8): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(9): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(10): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(11): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)
)
(locked_dropout): LockedDropout(p=0.5)
(linear): Linear(in_features=768, out_features=21, bias=True)
(loss_function): CrossEntropyLoss()
)"
2023-10-24 10:02:59,641 ----------------------------------------------------------------------------------------------------
2023-10-24 10:02:59,641 MultiCorpus: 5901 train + 1287 dev + 1505 test sentences
- NER_HIPE_2022 Corpus: 5901 train + 1287 dev + 1505 test sentences - /home/ubuntu/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/fr/with_doc_seperator
2023-10-24 10:02:59,641 ----------------------------------------------------------------------------------------------------
2023-10-24 10:02:59,641 Train: 5901 sentences
2023-10-24 10:02:59,641 (train_with_dev=False, train_with_test=False)
2023-10-24 10:02:59,641 ----------------------------------------------------------------------------------------------------
2023-10-24 10:02:59,641 Training Params:
2023-10-24 10:02:59,641 - learning_rate: "3e-05"
2023-10-24 10:02:59,641 - mini_batch_size: "8"
2023-10-24 10:02:59,641 - max_epochs: "10"
2023-10-24 10:02:59,641 - shuffle: "True"
2023-10-24 10:02:59,641 ----------------------------------------------------------------------------------------------------
2023-10-24 10:02:59,641 Plugins:
2023-10-24 10:02:59,641 - TensorboardLogger
2023-10-24 10:02:59,642 - LinearScheduler | warmup_fraction: '0.1'
2023-10-24 10:02:59,642 ----------------------------------------------------------------------------------------------------
2023-10-24 10:02:59,642 Final evaluation on model from best epoch (best-model.pt)
2023-10-24 10:02:59,642 - metric: "('micro avg', 'f1-score')"
2023-10-24 10:02:59,642 ----------------------------------------------------------------------------------------------------
2023-10-24 10:02:59,642 Computation:
2023-10-24 10:02:59,642 - compute on device: cuda:0
2023-10-24 10:02:59,642 - embedding storage: none
2023-10-24 10:02:59,642 ----------------------------------------------------------------------------------------------------
2023-10-24 10:02:59,642 Model training base path: "hmbench-hipe2020/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-2"
2023-10-24 10:02:59,642 ----------------------------------------------------------------------------------------------------
2023-10-24 10:02:59,642 ----------------------------------------------------------------------------------------------------
2023-10-24 10:02:59,642 Logging anything other than scalars to TensorBoard is currently not supported.
2023-10-24 10:03:05,895 epoch 1 - iter 73/738 - loss 2.42550890 - time (sec): 6.25 - samples/sec: 2471.21 - lr: 0.000003 - momentum: 0.000000
2023-10-24 10:03:12,507 epoch 1 - iter 146/738 - loss 1.50600625 - time (sec): 12.86 - samples/sec: 2418.68 - lr: 0.000006 - momentum: 0.000000
2023-10-24 10:03:19,533 epoch 1 - iter 219/738 - loss 1.12767070 - time (sec): 19.89 - samples/sec: 2397.52 - lr: 0.000009 - momentum: 0.000000
2023-10-24 10:03:26,301 epoch 1 - iter 292/738 - loss 0.93406611 - time (sec): 26.66 - samples/sec: 2376.48 - lr: 0.000012 - momentum: 0.000000
2023-10-24 10:03:33,259 epoch 1 - iter 365/738 - loss 0.80134282 - time (sec): 33.62 - samples/sec: 2375.26 - lr: 0.000015 - momentum: 0.000000
2023-10-24 10:03:39,853 epoch 1 - iter 438/738 - loss 0.71149145 - time (sec): 40.21 - samples/sec: 2361.86 - lr: 0.000018 - momentum: 0.000000
2023-10-24 10:03:47,300 epoch 1 - iter 511/738 - loss 0.63540461 - time (sec): 47.66 - samples/sec: 2356.51 - lr: 0.000021 - momentum: 0.000000
2023-10-24 10:03:53,988 epoch 1 - iter 584/738 - loss 0.57914697 - time (sec): 54.35 - samples/sec: 2358.43 - lr: 0.000024 - momentum: 0.000000
2023-10-24 10:04:01,580 epoch 1 - iter 657/738 - loss 0.53176914 - time (sec): 61.94 - samples/sec: 2360.80 - lr: 0.000027 - momentum: 0.000000
2023-10-24 10:04:09,449 epoch 1 - iter 730/738 - loss 0.49158401 - time (sec): 69.81 - samples/sec: 2357.65 - lr: 0.000030 - momentum: 0.000000
2023-10-24 10:04:10,192 ----------------------------------------------------------------------------------------------------
2023-10-24 10:04:10,192 EPOCH 1 done: loss 0.4878 - lr: 0.000030
2023-10-24 10:04:16,415 DEV : loss 0.10594037920236588 - f1-score (micro avg) 0.7283
2023-10-24 10:04:16,436 saving best model
2023-10-24 10:04:16,986 ----------------------------------------------------------------------------------------------------
2023-10-24 10:04:23,926 epoch 2 - iter 73/738 - loss 0.14091494 - time (sec): 6.94 - samples/sec: 2335.41 - lr: 0.000030 - momentum: 0.000000
2023-10-24 10:04:31,241 epoch 2 - iter 146/738 - loss 0.12231896 - time (sec): 14.25 - samples/sec: 2342.68 - lr: 0.000029 - momentum: 0.000000
2023-10-24 10:04:37,881 epoch 2 - iter 219/738 - loss 0.12111484 - time (sec): 20.89 - samples/sec: 2339.29 - lr: 0.000029 - momentum: 0.000000
2023-10-24 10:04:45,345 epoch 2 - iter 292/738 - loss 0.12184837 - time (sec): 28.36 - samples/sec: 2317.62 - lr: 0.000029 - momentum: 0.000000
2023-10-24 10:04:52,383 epoch 2 - iter 365/738 - loss 0.12080821 - time (sec): 35.40 - samples/sec: 2342.50 - lr: 0.000028 - momentum: 0.000000
2023-10-24 10:04:59,079 epoch 2 - iter 438/738 - loss 0.11532571 - time (sec): 42.09 - samples/sec: 2347.84 - lr: 0.000028 - momentum: 0.000000
2023-10-24 10:05:06,120 epoch 2 - iter 511/738 - loss 0.11512762 - time (sec): 49.13 - samples/sec: 2336.44 - lr: 0.000028 - momentum: 0.000000
2023-10-24 10:05:13,584 epoch 2 - iter 584/738 - loss 0.11551858 - time (sec): 56.60 - samples/sec: 2347.44 - lr: 0.000027 - momentum: 0.000000
2023-10-24 10:05:20,545 epoch 2 - iter 657/738 - loss 0.11451103 - time (sec): 63.56 - samples/sec: 2343.30 - lr: 0.000027 - momentum: 0.000000
2023-10-24 10:05:27,102 epoch 2 - iter 730/738 - loss 0.11268099 - time (sec): 70.11 - samples/sec: 2353.65 - lr: 0.000027 - momentum: 0.000000
2023-10-24 10:05:27,733 ----------------------------------------------------------------------------------------------------
2023-10-24 10:05:27,734 EPOCH 2 done: loss 0.1124 - lr: 0.000027
2023-10-24 10:05:36,219 DEV : loss 0.1031927615404129 - f1-score (micro avg) 0.8039
2023-10-24 10:05:36,241 saving best model
2023-10-24 10:05:36,964 ----------------------------------------------------------------------------------------------------
2023-10-24 10:05:43,681 epoch 3 - iter 73/738 - loss 0.06142961 - time (sec): 6.72 - samples/sec: 2392.58 - lr: 0.000026 - momentum: 0.000000
2023-10-24 10:05:50,369 epoch 3 - iter 146/738 - loss 0.06320856 - time (sec): 13.40 - samples/sec: 2398.13 - lr: 0.000026 - momentum: 0.000000
2023-10-24 10:05:57,239 epoch 3 - iter 219/738 - loss 0.06406728 - time (sec): 20.27 - samples/sec: 2350.76 - lr: 0.000026 - momentum: 0.000000
2023-10-24 10:06:04,920 epoch 3 - iter 292/738 - loss 0.06949115 - time (sec): 27.95 - samples/sec: 2363.73 - lr: 0.000025 - momentum: 0.000000
2023-10-24 10:06:12,095 epoch 3 - iter 365/738 - loss 0.06948464 - time (sec): 35.13 - samples/sec: 2368.81 - lr: 0.000025 - momentum: 0.000000
2023-10-24 10:06:18,669 epoch 3 - iter 438/738 - loss 0.06584312 - time (sec): 41.70 - samples/sec: 2376.09 - lr: 0.000025 - momentum: 0.000000
2023-10-24 10:06:25,135 epoch 3 - iter 511/738 - loss 0.06518111 - time (sec): 48.17 - samples/sec: 2382.47 - lr: 0.000024 - momentum: 0.000000
2023-10-24 10:06:32,856 epoch 3 - iter 584/738 - loss 0.06469238 - time (sec): 55.89 - samples/sec: 2372.12 - lr: 0.000024 - momentum: 0.000000
2023-10-24 10:06:39,410 epoch 3 - iter 657/738 - loss 0.06589124 - time (sec): 62.45 - samples/sec: 2374.79 - lr: 0.000024 - momentum: 0.000000
2023-10-24 10:06:46,704 epoch 3 - iter 730/738 - loss 0.06610678 - time (sec): 69.74 - samples/sec: 2361.48 - lr: 0.000023 - momentum: 0.000000
2023-10-24 10:06:47,394 ----------------------------------------------------------------------------------------------------
2023-10-24 10:06:47,394 EPOCH 3 done: loss 0.0660 - lr: 0.000023
2023-10-24 10:06:55,870 DEV : loss 0.10477666556835175 - f1-score (micro avg) 0.822
2023-10-24 10:06:55,892 saving best model
2023-10-24 10:06:56,591 ----------------------------------------------------------------------------------------------------
2023-10-24 10:07:03,333 epoch 4 - iter 73/738 - loss 0.04080442 - time (sec): 6.74 - samples/sec: 2324.73 - lr: 0.000023 - momentum: 0.000000
2023-10-24 10:07:11,372 epoch 4 - iter 146/738 - loss 0.04335989 - time (sec): 14.78 - samples/sec: 2266.77 - lr: 0.000023 - momentum: 0.000000
2023-10-24 10:07:18,506 epoch 4 - iter 219/738 - loss 0.04233301 - time (sec): 21.91 - samples/sec: 2373.52 - lr: 0.000022 - momentum: 0.000000
2023-10-24 10:07:25,615 epoch 4 - iter 292/738 - loss 0.04153088 - time (sec): 29.02 - samples/sec: 2358.09 - lr: 0.000022 - momentum: 0.000000
2023-10-24 10:07:32,101 epoch 4 - iter 365/738 - loss 0.04133152 - time (sec): 35.51 - samples/sec: 2370.06 - lr: 0.000022 - momentum: 0.000000
2023-10-24 10:07:39,193 epoch 4 - iter 438/738 - loss 0.04256243 - time (sec): 42.60 - samples/sec: 2370.12 - lr: 0.000021 - momentum: 0.000000
2023-10-24 10:07:46,147 epoch 4 - iter 511/738 - loss 0.04221785 - time (sec): 49.56 - samples/sec: 2352.83 - lr: 0.000021 - momentum: 0.000000
2023-10-24 10:07:53,003 epoch 4 - iter 584/738 - loss 0.04300245 - time (sec): 56.41 - samples/sec: 2352.17 - lr: 0.000021 - momentum: 0.000000
2023-10-24 10:08:00,219 epoch 4 - iter 657/738 - loss 0.04276968 - time (sec): 63.63 - samples/sec: 2342.79 - lr: 0.000020 - momentum: 0.000000
2023-10-24 10:08:06,859 epoch 4 - iter 730/738 - loss 0.04317653 - time (sec): 70.27 - samples/sec: 2342.56 - lr: 0.000020 - momentum: 0.000000
2023-10-24 10:08:07,592 ----------------------------------------------------------------------------------------------------
2023-10-24 10:08:07,592 EPOCH 4 done: loss 0.0432 - lr: 0.000020
2023-10-24 10:08:16,095 DEV : loss 0.152599036693573 - f1-score (micro avg) 0.8181
2023-10-24 10:08:16,116 ----------------------------------------------------------------------------------------------------
2023-10-24 10:08:23,270 epoch 5 - iter 73/738 - loss 0.03851342 - time (sec): 7.15 - samples/sec: 2259.95 - lr: 0.000020 - momentum: 0.000000
2023-10-24 10:08:30,725 epoch 5 - iter 146/738 - loss 0.02771039 - time (sec): 14.61 - samples/sec: 2337.46 - lr: 0.000019 - momentum: 0.000000
2023-10-24 10:08:37,294 epoch 5 - iter 219/738 - loss 0.03124182 - time (sec): 21.18 - samples/sec: 2370.01 - lr: 0.000019 - momentum: 0.000000
2023-10-24 10:08:44,328 epoch 5 - iter 292/738 - loss 0.02900780 - time (sec): 28.21 - samples/sec: 2371.67 - lr: 0.000019 - momentum: 0.000000
2023-10-24 10:08:51,054 epoch 5 - iter 365/738 - loss 0.02939080 - time (sec): 34.94 - samples/sec: 2388.23 - lr: 0.000018 - momentum: 0.000000
2023-10-24 10:08:58,572 epoch 5 - iter 438/738 - loss 0.03158563 - time (sec): 42.45 - samples/sec: 2390.55 - lr: 0.000018 - momentum: 0.000000
2023-10-24 10:09:05,161 epoch 5 - iter 511/738 - loss 0.03343792 - time (sec): 49.04 - samples/sec: 2377.13 - lr: 0.000018 - momentum: 0.000000
2023-10-24 10:09:12,032 epoch 5 - iter 584/738 - loss 0.03263845 - time (sec): 55.91 - samples/sec: 2365.33 - lr: 0.000017 - momentum: 0.000000
2023-10-24 10:09:18,863 epoch 5 - iter 657/738 - loss 0.03330939 - time (sec): 62.75 - samples/sec: 2355.58 - lr: 0.000017 - momentum: 0.000000
2023-10-24 10:09:26,361 epoch 5 - iter 730/738 - loss 0.03263123 - time (sec): 70.24 - samples/sec: 2343.37 - lr: 0.000017 - momentum: 0.000000
2023-10-24 10:09:27,040 ----------------------------------------------------------------------------------------------------
2023-10-24 10:09:27,041 EPOCH 5 done: loss 0.0325 - lr: 0.000017
2023-10-24 10:09:35,593 DEV : loss 0.16575849056243896 - f1-score (micro avg) 0.8262
2023-10-24 10:09:35,615 saving best model
2023-10-24 10:09:36,323 ----------------------------------------------------------------------------------------------------
2023-10-24 10:09:44,530 epoch 6 - iter 73/738 - loss 0.02678751 - time (sec): 8.21 - samples/sec: 2432.33 - lr: 0.000016 - momentum: 0.000000
2023-10-24 10:09:50,761 epoch 6 - iter 146/738 - loss 0.02291770 - time (sec): 14.44 - samples/sec: 2413.80 - lr: 0.000016 - momentum: 0.000000
2023-10-24 10:09:58,454 epoch 6 - iter 219/738 - loss 0.02553640 - time (sec): 22.13 - samples/sec: 2340.90 - lr: 0.000016 - momentum: 0.000000
2023-10-24 10:10:04,889 epoch 6 - iter 292/738 - loss 0.02480548 - time (sec): 28.56 - samples/sec: 2332.14 - lr: 0.000015 - momentum: 0.000000
2023-10-24 10:10:11,608 epoch 6 - iter 365/738 - loss 0.02547611 - time (sec): 35.28 - samples/sec: 2353.19 - lr: 0.000015 - momentum: 0.000000
2023-10-24 10:10:18,635 epoch 6 - iter 438/738 - loss 0.02484851 - time (sec): 42.31 - samples/sec: 2356.94 - lr: 0.000015 - momentum: 0.000000
2023-10-24 10:10:25,292 epoch 6 - iter 511/738 - loss 0.02610251 - time (sec): 48.97 - samples/sec: 2361.14 - lr: 0.000014 - momentum: 0.000000
2023-10-24 10:10:31,906 epoch 6 - iter 584/738 - loss 0.02565136 - time (sec): 55.58 - samples/sec: 2358.71 - lr: 0.000014 - momentum: 0.000000
2023-10-24 10:10:38,251 epoch 6 - iter 657/738 - loss 0.02458488 - time (sec): 61.93 - samples/sec: 2358.32 - lr: 0.000014 - momentum: 0.000000
2023-10-24 10:10:45,681 epoch 6 - iter 730/738 - loss 0.02431977 - time (sec): 69.36 - samples/sec: 2365.19 - lr: 0.000013 - momentum: 0.000000
2023-10-24 10:10:46,741 ----------------------------------------------------------------------------------------------------
2023-10-24 10:10:46,741 EPOCH 6 done: loss 0.0242 - lr: 0.000013
2023-10-24 10:10:55,250 DEV : loss 0.19801990687847137 - f1-score (micro avg) 0.8271
2023-10-24 10:10:55,271 saving best model
2023-10-24 10:10:55,967 ----------------------------------------------------------------------------------------------------
2023-10-24 10:11:02,824 epoch 7 - iter 73/738 - loss 0.01696546 - time (sec): 6.86 - samples/sec: 2413.99 - lr: 0.000013 - momentum: 0.000000
2023-10-24 10:11:09,882 epoch 7 - iter 146/738 - loss 0.01294632 - time (sec): 13.91 - samples/sec: 2378.56 - lr: 0.000013 - momentum: 0.000000
2023-10-24 10:11:17,288 epoch 7 - iter 219/738 - loss 0.01398777 - time (sec): 21.32 - samples/sec: 2369.26 - lr: 0.000012 - momentum: 0.000000
2023-10-24 10:11:24,456 epoch 7 - iter 292/738 - loss 0.01547935 - time (sec): 28.49 - samples/sec: 2349.20 - lr: 0.000012 - momentum: 0.000000
2023-10-24 10:11:31,386 epoch 7 - iter 365/738 - loss 0.01507352 - time (sec): 35.42 - samples/sec: 2339.48 - lr: 0.000012 - momentum: 0.000000
2023-10-24 10:11:38,521 epoch 7 - iter 438/738 - loss 0.01688411 - time (sec): 42.55 - samples/sec: 2328.32 - lr: 0.000011 - momentum: 0.000000
2023-10-24 10:11:46,033 epoch 7 - iter 511/738 - loss 0.01735857 - time (sec): 50.06 - samples/sec: 2335.27 - lr: 0.000011 - momentum: 0.000000
2023-10-24 10:11:52,504 epoch 7 - iter 584/738 - loss 0.01711860 - time (sec): 56.54 - samples/sec: 2328.63 - lr: 0.000011 - momentum: 0.000000
2023-10-24 10:11:58,762 epoch 7 - iter 657/738 - loss 0.01671606 - time (sec): 62.79 - samples/sec: 2345.39 - lr: 0.000010 - momentum: 0.000000
2023-10-24 10:12:06,314 epoch 7 - iter 730/738 - loss 0.01629479 - time (sec): 70.35 - samples/sec: 2344.46 - lr: 0.000010 - momentum: 0.000000
2023-10-24 10:12:06,944 ----------------------------------------------------------------------------------------------------
2023-10-24 10:12:06,944 EPOCH 7 done: loss 0.0164 - lr: 0.000010
2023-10-24 10:12:15,453 DEV : loss 0.2032857984304428 - f1-score (micro avg) 0.8268
2023-10-24 10:12:15,475 ----------------------------------------------------------------------------------------------------
2023-10-24 10:12:22,424 epoch 8 - iter 73/738 - loss 0.01216183 - time (sec): 6.95 - samples/sec: 2287.40 - lr: 0.000010 - momentum: 0.000000
2023-10-24 10:12:29,512 epoch 8 - iter 146/738 - loss 0.01236948 - time (sec): 14.04 - samples/sec: 2336.90 - lr: 0.000009 - momentum: 0.000000
2023-10-24 10:12:37,062 epoch 8 - iter 219/738 - loss 0.01044188 - time (sec): 21.59 - samples/sec: 2309.04 - lr: 0.000009 - momentum: 0.000000
2023-10-24 10:12:43,982 epoch 8 - iter 292/738 - loss 0.01034487 - time (sec): 28.51 - samples/sec: 2346.41 - lr: 0.000009 - momentum: 0.000000
2023-10-24 10:12:51,543 epoch 8 - iter 365/738 - loss 0.01126341 - time (sec): 36.07 - samples/sec: 2368.99 - lr: 0.000008 - momentum: 0.000000
2023-10-24 10:12:58,262 epoch 8 - iter 438/738 - loss 0.01076147 - time (sec): 42.79 - samples/sec: 2353.67 - lr: 0.000008 - momentum: 0.000000
2023-10-24 10:13:05,389 epoch 8 - iter 511/738 - loss 0.01075803 - time (sec): 49.91 - samples/sec: 2354.08 - lr: 0.000008 - momentum: 0.000000
2023-10-24 10:13:12,279 epoch 8 - iter 584/738 - loss 0.01183127 - time (sec): 56.80 - samples/sec: 2342.58 - lr: 0.000007 - momentum: 0.000000
2023-10-24 10:13:19,173 epoch 8 - iter 657/738 - loss 0.01204421 - time (sec): 63.70 - samples/sec: 2347.40 - lr: 0.000007 - momentum: 0.000000
2023-10-24 10:13:25,559 epoch 8 - iter 730/738 - loss 0.01168359 - time (sec): 70.08 - samples/sec: 2351.74 - lr: 0.000007 - momentum: 0.000000
2023-10-24 10:13:26,318 ----------------------------------------------------------------------------------------------------
2023-10-24 10:13:26,318 EPOCH 8 done: loss 0.0116 - lr: 0.000007
2023-10-24 10:13:34,847 DEV : loss 0.19606834650039673 - f1-score (micro avg) 0.8411
2023-10-24 10:13:34,869 saving best model
2023-10-24 10:13:35,564 ----------------------------------------------------------------------------------------------------
2023-10-24 10:13:42,230 epoch 9 - iter 73/738 - loss 0.00325841 - time (sec): 6.67 - samples/sec: 2350.12 - lr: 0.000006 - momentum: 0.000000
2023-10-24 10:13:49,266 epoch 9 - iter 146/738 - loss 0.00560515 - time (sec): 13.70 - samples/sec: 2327.56 - lr: 0.000006 - momentum: 0.000000
2023-10-24 10:13:55,847 epoch 9 - iter 219/738 - loss 0.00871536 - time (sec): 20.28 - samples/sec: 2341.82 - lr: 0.000006 - momentum: 0.000000
2023-10-24 10:14:03,005 epoch 9 - iter 292/738 - loss 0.00745041 - time (sec): 27.44 - samples/sec: 2358.00 - lr: 0.000005 - momentum: 0.000000
2023-10-24 10:14:10,081 epoch 9 - iter 365/738 - loss 0.00721985 - time (sec): 34.52 - samples/sec: 2337.43 - lr: 0.000005 - momentum: 0.000000
2023-10-24 10:14:16,553 epoch 9 - iter 438/738 - loss 0.00827858 - time (sec): 40.99 - samples/sec: 2344.81 - lr: 0.000005 - momentum: 0.000000
2023-10-24 10:14:22,960 epoch 9 - iter 511/738 - loss 0.00832065 - time (sec): 47.39 - samples/sec: 2343.01 - lr: 0.000004 - momentum: 0.000000
2023-10-24 10:14:30,542 epoch 9 - iter 584/738 - loss 0.00748758 - time (sec): 54.98 - samples/sec: 2349.55 - lr: 0.000004 - momentum: 0.000000
2023-10-24 10:14:37,973 epoch 9 - iter 657/738 - loss 0.00872752 - time (sec): 62.41 - samples/sec: 2360.31 - lr: 0.000004 - momentum: 0.000000
2023-10-24 10:14:45,591 epoch 9 - iter 730/738 - loss 0.00832303 - time (sec): 70.03 - samples/sec: 2355.51 - lr: 0.000003 - momentum: 0.000000
2023-10-24 10:14:46,236 ----------------------------------------------------------------------------------------------------
2023-10-24 10:14:46,236 EPOCH 9 done: loss 0.0083 - lr: 0.000003
2023-10-24 10:14:54,743 DEV : loss 0.21085196733474731 - f1-score (micro avg) 0.8465
2023-10-24 10:14:54,765 saving best model
2023-10-24 10:14:55,461 ----------------------------------------------------------------------------------------------------
2023-10-24 10:15:03,114 epoch 10 - iter 73/738 - loss 0.00225549 - time (sec): 7.65 - samples/sec: 2254.22 - lr: 0.000003 - momentum: 0.000000
2023-10-24 10:15:09,825 epoch 10 - iter 146/738 - loss 0.00248018 - time (sec): 14.36 - samples/sec: 2313.87 - lr: 0.000003 - momentum: 0.000000
2023-10-24 10:15:16,491 epoch 10 - iter 219/738 - loss 0.00258571 - time (sec): 21.03 - samples/sec: 2308.26 - lr: 0.000002 - momentum: 0.000000
2023-10-24 10:15:23,656 epoch 10 - iter 292/738 - loss 0.00272149 - time (sec): 28.19 - samples/sec: 2317.62 - lr: 0.000002 - momentum: 0.000000
2023-10-24 10:15:31,083 epoch 10 - iter 365/738 - loss 0.00322358 - time (sec): 35.62 - samples/sec: 2357.62 - lr: 0.000002 - momentum: 0.000000
2023-10-24 10:15:38,019 epoch 10 - iter 438/738 - loss 0.00349996 - time (sec): 42.56 - samples/sec: 2352.67 - lr: 0.000001 - momentum: 0.000000
2023-10-24 10:15:45,376 epoch 10 - iter 511/738 - loss 0.00398963 - time (sec): 49.91 - samples/sec: 2355.98 - lr: 0.000001 - momentum: 0.000000
2023-10-24 10:15:52,559 epoch 10 - iter 584/738 - loss 0.00460257 - time (sec): 57.10 - samples/sec: 2352.17 - lr: 0.000001 - momentum: 0.000000
2023-10-24 10:15:58,908 epoch 10 - iter 657/738 - loss 0.00450873 - time (sec): 63.45 - samples/sec: 2352.91 - lr: 0.000000 - momentum: 0.000000
2023-10-24 10:16:05,753 epoch 10 - iter 730/738 - loss 0.00478145 - time (sec): 70.29 - samples/sec: 2345.44 - lr: 0.000000 - momentum: 0.000000
2023-10-24 10:16:06,439 ----------------------------------------------------------------------------------------------------
2023-10-24 10:16:06,440 EPOCH 10 done: loss 0.0048 - lr: 0.000000
2023-10-24 10:16:14,959 DEV : loss 0.21770432591438293 - f1-score (micro avg) 0.8466
2023-10-24 10:16:14,981 saving best model
2023-10-24 10:16:16,242 ----------------------------------------------------------------------------------------------------
2023-10-24 10:16:16,243 Loading model from best epoch ...
2023-10-24 10:16:18,060 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-time, B-time, E-time, I-time, S-prod, B-prod, E-prod, I-prod
2023-10-24 10:16:24,729
Results:
- F-score (micro) 0.7923
- F-score (macro) 0.7091
- Accuracy 0.6784
By class:
precision recall f1-score support
loc 0.8467 0.8753 0.8607 858
pers 0.7404 0.7914 0.7651 537
org 0.5532 0.5909 0.5714 132
time 0.5217 0.6667 0.5854 54
prod 0.7895 0.7377 0.7627 61
micro avg 0.7726 0.8130 0.7923 1642
macro avg 0.6903 0.7324 0.7091 1642
weighted avg 0.7755 0.8130 0.7935 1642
2023-10-24 10:16:24,729 ----------------------------------------------------------------------------------------------------