2023-10-24 10:02:59,640 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:02:59,641 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(64001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (1): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (2): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (3): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (4): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (5): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (6): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (7): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (8): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (9): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (10): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (11): BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=21, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-24 10:02:59,641 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:02:59,641 MultiCorpus: 5901 train + 1287 dev + 1505 test sentences - NER_HIPE_2022 Corpus: 5901 train + 1287 dev + 1505 test sentences - /home/ubuntu/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/fr/with_doc_seperator 2023-10-24 10:02:59,641 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:02:59,641 Train: 5901 sentences 2023-10-24 10:02:59,641 (train_with_dev=False, train_with_test=False) 2023-10-24 10:02:59,641 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:02:59,641 Training Params: 2023-10-24 10:02:59,641 - learning_rate: "3e-05" 2023-10-24 10:02:59,641 - mini_batch_size: "8" 2023-10-24 10:02:59,641 - max_epochs: "10" 2023-10-24 10:02:59,641 - shuffle: "True" 2023-10-24 10:02:59,641 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:02:59,641 Plugins: 2023-10-24 10:02:59,641 - TensorboardLogger 2023-10-24 10:02:59,642 - LinearScheduler | warmup_fraction: '0.1' 2023-10-24 10:02:59,642 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:02:59,642 Final evaluation on model from best epoch (best-model.pt) 2023-10-24 10:02:59,642 - metric: "('micro avg', 'f1-score')" 2023-10-24 10:02:59,642 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:02:59,642 Computation: 2023-10-24 10:02:59,642 - compute on device: cuda:0 2023-10-24 10:02:59,642 - embedding storage: none 2023-10-24 10:02:59,642 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:02:59,642 Model training base path: "hmbench-hipe2020/fr-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-2" 2023-10-24 10:02:59,642 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:02:59,642 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:02:59,642 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-24 10:03:05,895 epoch 1 - iter 73/738 - loss 2.42550890 - time (sec): 6.25 - samples/sec: 2471.21 - lr: 0.000003 - momentum: 0.000000 2023-10-24 10:03:12,507 epoch 1 - iter 146/738 - loss 1.50600625 - time (sec): 12.86 - samples/sec: 2418.68 - lr: 0.000006 - momentum: 0.000000 2023-10-24 10:03:19,533 epoch 1 - iter 219/738 - loss 1.12767070 - time (sec): 19.89 - samples/sec: 2397.52 - lr: 0.000009 - momentum: 0.000000 2023-10-24 10:03:26,301 epoch 1 - iter 292/738 - loss 0.93406611 - time (sec): 26.66 - samples/sec: 2376.48 - lr: 0.000012 - momentum: 0.000000 2023-10-24 10:03:33,259 epoch 1 - iter 365/738 - loss 0.80134282 - time (sec): 33.62 - samples/sec: 2375.26 - lr: 0.000015 - momentum: 0.000000 2023-10-24 10:03:39,853 epoch 1 - iter 438/738 - loss 0.71149145 - time (sec): 40.21 - samples/sec: 2361.86 - lr: 0.000018 - momentum: 0.000000 2023-10-24 10:03:47,300 epoch 1 - iter 511/738 - loss 0.63540461 - time (sec): 47.66 - samples/sec: 2356.51 - lr: 0.000021 - momentum: 0.000000 2023-10-24 10:03:53,988 epoch 1 - iter 584/738 - loss 0.57914697 - time (sec): 54.35 - samples/sec: 2358.43 - lr: 0.000024 - momentum: 0.000000 2023-10-24 10:04:01,580 epoch 1 - iter 657/738 - loss 0.53176914 - time (sec): 61.94 - samples/sec: 2360.80 - lr: 0.000027 - momentum: 0.000000 2023-10-24 10:04:09,449 epoch 1 - iter 730/738 - loss 0.49158401 - time (sec): 69.81 - samples/sec: 2357.65 - lr: 0.000030 - momentum: 0.000000 2023-10-24 10:04:10,192 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:04:10,192 EPOCH 1 done: loss 0.4878 - lr: 0.000030 2023-10-24 10:04:16,415 DEV : loss 0.10594037920236588 - f1-score (micro avg) 0.7283 2023-10-24 10:04:16,436 saving best model 2023-10-24 10:04:16,986 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:04:23,926 epoch 2 - iter 73/738 - loss 0.14091494 - time (sec): 6.94 - samples/sec: 2335.41 - lr: 0.000030 - momentum: 0.000000 2023-10-24 10:04:31,241 epoch 2 - iter 146/738 - loss 0.12231896 - time (sec): 14.25 - samples/sec: 2342.68 - lr: 0.000029 - momentum: 0.000000 2023-10-24 10:04:37,881 epoch 2 - iter 219/738 - loss 0.12111484 - time (sec): 20.89 - samples/sec: 2339.29 - lr: 0.000029 - momentum: 0.000000 2023-10-24 10:04:45,345 epoch 2 - iter 292/738 - loss 0.12184837 - time (sec): 28.36 - samples/sec: 2317.62 - lr: 0.000029 - momentum: 0.000000 2023-10-24 10:04:52,383 epoch 2 - iter 365/738 - loss 0.12080821 - time (sec): 35.40 - samples/sec: 2342.50 - lr: 0.000028 - momentum: 0.000000 2023-10-24 10:04:59,079 epoch 2 - iter 438/738 - loss 0.11532571 - time (sec): 42.09 - samples/sec: 2347.84 - lr: 0.000028 - momentum: 0.000000 2023-10-24 10:05:06,120 epoch 2 - iter 511/738 - loss 0.11512762 - time (sec): 49.13 - samples/sec: 2336.44 - lr: 0.000028 - momentum: 0.000000 2023-10-24 10:05:13,584 epoch 2 - iter 584/738 - loss 0.11551858 - time (sec): 56.60 - samples/sec: 2347.44 - lr: 0.000027 - momentum: 0.000000 2023-10-24 10:05:20,545 epoch 2 - iter 657/738 - loss 0.11451103 - time (sec): 63.56 - samples/sec: 2343.30 - lr: 0.000027 - momentum: 0.000000 2023-10-24 10:05:27,102 epoch 2 - iter 730/738 - loss 0.11268099 - time (sec): 70.11 - samples/sec: 2353.65 - lr: 0.000027 - momentum: 0.000000 2023-10-24 10:05:27,733 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:05:27,734 EPOCH 2 done: loss 0.1124 - lr: 0.000027 2023-10-24 10:05:36,219 DEV : loss 0.1031927615404129 - f1-score (micro avg) 0.8039 2023-10-24 10:05:36,241 saving best model 2023-10-24 10:05:36,964 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:05:43,681 epoch 3 - iter 73/738 - loss 0.06142961 - time (sec): 6.72 - samples/sec: 2392.58 - lr: 0.000026 - momentum: 0.000000 2023-10-24 10:05:50,369 epoch 3 - iter 146/738 - loss 0.06320856 - time (sec): 13.40 - samples/sec: 2398.13 - lr: 0.000026 - momentum: 0.000000 2023-10-24 10:05:57,239 epoch 3 - iter 219/738 - loss 0.06406728 - time (sec): 20.27 - samples/sec: 2350.76 - lr: 0.000026 - momentum: 0.000000 2023-10-24 10:06:04,920 epoch 3 - iter 292/738 - loss 0.06949115 - time (sec): 27.95 - samples/sec: 2363.73 - lr: 0.000025 - momentum: 0.000000 2023-10-24 10:06:12,095 epoch 3 - iter 365/738 - loss 0.06948464 - time (sec): 35.13 - samples/sec: 2368.81 - lr: 0.000025 - momentum: 0.000000 2023-10-24 10:06:18,669 epoch 3 - iter 438/738 - loss 0.06584312 - time (sec): 41.70 - samples/sec: 2376.09 - lr: 0.000025 - momentum: 0.000000 2023-10-24 10:06:25,135 epoch 3 - iter 511/738 - loss 0.06518111 - time (sec): 48.17 - samples/sec: 2382.47 - lr: 0.000024 - momentum: 0.000000 2023-10-24 10:06:32,856 epoch 3 - iter 584/738 - loss 0.06469238 - time (sec): 55.89 - samples/sec: 2372.12 - lr: 0.000024 - momentum: 0.000000 2023-10-24 10:06:39,410 epoch 3 - iter 657/738 - loss 0.06589124 - time (sec): 62.45 - samples/sec: 2374.79 - lr: 0.000024 - momentum: 0.000000 2023-10-24 10:06:46,704 epoch 3 - iter 730/738 - loss 0.06610678 - time (sec): 69.74 - samples/sec: 2361.48 - lr: 0.000023 - momentum: 0.000000 2023-10-24 10:06:47,394 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:06:47,394 EPOCH 3 done: loss 0.0660 - lr: 0.000023 2023-10-24 10:06:55,870 DEV : loss 0.10477666556835175 - f1-score (micro avg) 0.822 2023-10-24 10:06:55,892 saving best model 2023-10-24 10:06:56,591 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:07:03,333 epoch 4 - iter 73/738 - loss 0.04080442 - time (sec): 6.74 - samples/sec: 2324.73 - lr: 0.000023 - momentum: 0.000000 2023-10-24 10:07:11,372 epoch 4 - iter 146/738 - loss 0.04335989 - time (sec): 14.78 - samples/sec: 2266.77 - lr: 0.000023 - momentum: 0.000000 2023-10-24 10:07:18,506 epoch 4 - iter 219/738 - loss 0.04233301 - time (sec): 21.91 - samples/sec: 2373.52 - lr: 0.000022 - momentum: 0.000000 2023-10-24 10:07:25,615 epoch 4 - iter 292/738 - loss 0.04153088 - time (sec): 29.02 - samples/sec: 2358.09 - lr: 0.000022 - momentum: 0.000000 2023-10-24 10:07:32,101 epoch 4 - iter 365/738 - loss 0.04133152 - time (sec): 35.51 - samples/sec: 2370.06 - lr: 0.000022 - momentum: 0.000000 2023-10-24 10:07:39,193 epoch 4 - iter 438/738 - loss 0.04256243 - time (sec): 42.60 - samples/sec: 2370.12 - lr: 0.000021 - momentum: 0.000000 2023-10-24 10:07:46,147 epoch 4 - iter 511/738 - loss 0.04221785 - time (sec): 49.56 - samples/sec: 2352.83 - lr: 0.000021 - momentum: 0.000000 2023-10-24 10:07:53,003 epoch 4 - iter 584/738 - loss 0.04300245 - time (sec): 56.41 - samples/sec: 2352.17 - lr: 0.000021 - momentum: 0.000000 2023-10-24 10:08:00,219 epoch 4 - iter 657/738 - loss 0.04276968 - time (sec): 63.63 - samples/sec: 2342.79 - lr: 0.000020 - momentum: 0.000000 2023-10-24 10:08:06,859 epoch 4 - iter 730/738 - loss 0.04317653 - time (sec): 70.27 - samples/sec: 2342.56 - lr: 0.000020 - momentum: 0.000000 2023-10-24 10:08:07,592 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:08:07,592 EPOCH 4 done: loss 0.0432 - lr: 0.000020 2023-10-24 10:08:16,095 DEV : loss 0.152599036693573 - f1-score (micro avg) 0.8181 2023-10-24 10:08:16,116 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:08:23,270 epoch 5 - iter 73/738 - loss 0.03851342 - time (sec): 7.15 - samples/sec: 2259.95 - lr: 0.000020 - momentum: 0.000000 2023-10-24 10:08:30,725 epoch 5 - iter 146/738 - loss 0.02771039 - time (sec): 14.61 - samples/sec: 2337.46 - lr: 0.000019 - momentum: 0.000000 2023-10-24 10:08:37,294 epoch 5 - iter 219/738 - loss 0.03124182 - time (sec): 21.18 - samples/sec: 2370.01 - lr: 0.000019 - momentum: 0.000000 2023-10-24 10:08:44,328 epoch 5 - iter 292/738 - loss 0.02900780 - time (sec): 28.21 - samples/sec: 2371.67 - lr: 0.000019 - momentum: 0.000000 2023-10-24 10:08:51,054 epoch 5 - iter 365/738 - loss 0.02939080 - time (sec): 34.94 - samples/sec: 2388.23 - lr: 0.000018 - momentum: 0.000000 2023-10-24 10:08:58,572 epoch 5 - iter 438/738 - loss 0.03158563 - time (sec): 42.45 - samples/sec: 2390.55 - lr: 0.000018 - momentum: 0.000000 2023-10-24 10:09:05,161 epoch 5 - iter 511/738 - loss 0.03343792 - time (sec): 49.04 - samples/sec: 2377.13 - lr: 0.000018 - momentum: 0.000000 2023-10-24 10:09:12,032 epoch 5 - iter 584/738 - loss 0.03263845 - time (sec): 55.91 - samples/sec: 2365.33 - lr: 0.000017 - momentum: 0.000000 2023-10-24 10:09:18,863 epoch 5 - iter 657/738 - loss 0.03330939 - time (sec): 62.75 - samples/sec: 2355.58 - lr: 0.000017 - momentum: 0.000000 2023-10-24 10:09:26,361 epoch 5 - iter 730/738 - loss 0.03263123 - time (sec): 70.24 - samples/sec: 2343.37 - lr: 0.000017 - momentum: 0.000000 2023-10-24 10:09:27,040 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:09:27,041 EPOCH 5 done: loss 0.0325 - lr: 0.000017 2023-10-24 10:09:35,593 DEV : loss 0.16575849056243896 - f1-score (micro avg) 0.8262 2023-10-24 10:09:35,615 saving best model 2023-10-24 10:09:36,323 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:09:44,530 epoch 6 - iter 73/738 - loss 0.02678751 - time (sec): 8.21 - samples/sec: 2432.33 - lr: 0.000016 - momentum: 0.000000 2023-10-24 10:09:50,761 epoch 6 - iter 146/738 - loss 0.02291770 - time (sec): 14.44 - samples/sec: 2413.80 - lr: 0.000016 - momentum: 0.000000 2023-10-24 10:09:58,454 epoch 6 - iter 219/738 - loss 0.02553640 - time (sec): 22.13 - samples/sec: 2340.90 - lr: 0.000016 - momentum: 0.000000 2023-10-24 10:10:04,889 epoch 6 - iter 292/738 - loss 0.02480548 - time (sec): 28.56 - samples/sec: 2332.14 - lr: 0.000015 - momentum: 0.000000 2023-10-24 10:10:11,608 epoch 6 - iter 365/738 - loss 0.02547611 - time (sec): 35.28 - samples/sec: 2353.19 - lr: 0.000015 - momentum: 0.000000 2023-10-24 10:10:18,635 epoch 6 - iter 438/738 - loss 0.02484851 - time (sec): 42.31 - samples/sec: 2356.94 - lr: 0.000015 - momentum: 0.000000 2023-10-24 10:10:25,292 epoch 6 - iter 511/738 - loss 0.02610251 - time (sec): 48.97 - samples/sec: 2361.14 - lr: 0.000014 - momentum: 0.000000 2023-10-24 10:10:31,906 epoch 6 - iter 584/738 - loss 0.02565136 - time (sec): 55.58 - samples/sec: 2358.71 - lr: 0.000014 - momentum: 0.000000 2023-10-24 10:10:38,251 epoch 6 - iter 657/738 - loss 0.02458488 - time (sec): 61.93 - samples/sec: 2358.32 - lr: 0.000014 - momentum: 0.000000 2023-10-24 10:10:45,681 epoch 6 - iter 730/738 - loss 0.02431977 - time (sec): 69.36 - samples/sec: 2365.19 - lr: 0.000013 - momentum: 0.000000 2023-10-24 10:10:46,741 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:10:46,741 EPOCH 6 done: loss 0.0242 - lr: 0.000013 2023-10-24 10:10:55,250 DEV : loss 0.19801990687847137 - f1-score (micro avg) 0.8271 2023-10-24 10:10:55,271 saving best model 2023-10-24 10:10:55,967 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:11:02,824 epoch 7 - iter 73/738 - loss 0.01696546 - time (sec): 6.86 - samples/sec: 2413.99 - lr: 0.000013 - momentum: 0.000000 2023-10-24 10:11:09,882 epoch 7 - iter 146/738 - loss 0.01294632 - time (sec): 13.91 - samples/sec: 2378.56 - lr: 0.000013 - momentum: 0.000000 2023-10-24 10:11:17,288 epoch 7 - iter 219/738 - loss 0.01398777 - time (sec): 21.32 - samples/sec: 2369.26 - lr: 0.000012 - momentum: 0.000000 2023-10-24 10:11:24,456 epoch 7 - iter 292/738 - loss 0.01547935 - time (sec): 28.49 - samples/sec: 2349.20 - lr: 0.000012 - momentum: 0.000000 2023-10-24 10:11:31,386 epoch 7 - iter 365/738 - loss 0.01507352 - time (sec): 35.42 - samples/sec: 2339.48 - lr: 0.000012 - momentum: 0.000000 2023-10-24 10:11:38,521 epoch 7 - iter 438/738 - loss 0.01688411 - time (sec): 42.55 - samples/sec: 2328.32 - lr: 0.000011 - momentum: 0.000000 2023-10-24 10:11:46,033 epoch 7 - iter 511/738 - loss 0.01735857 - time (sec): 50.06 - samples/sec: 2335.27 - lr: 0.000011 - momentum: 0.000000 2023-10-24 10:11:52,504 epoch 7 - iter 584/738 - loss 0.01711860 - time (sec): 56.54 - samples/sec: 2328.63 - lr: 0.000011 - momentum: 0.000000 2023-10-24 10:11:58,762 epoch 7 - iter 657/738 - loss 0.01671606 - time (sec): 62.79 - samples/sec: 2345.39 - lr: 0.000010 - momentum: 0.000000 2023-10-24 10:12:06,314 epoch 7 - iter 730/738 - loss 0.01629479 - time (sec): 70.35 - samples/sec: 2344.46 - lr: 0.000010 - momentum: 0.000000 2023-10-24 10:12:06,944 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:12:06,944 EPOCH 7 done: loss 0.0164 - lr: 0.000010 2023-10-24 10:12:15,453 DEV : loss 0.2032857984304428 - f1-score (micro avg) 0.8268 2023-10-24 10:12:15,475 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:12:22,424 epoch 8 - iter 73/738 - loss 0.01216183 - time (sec): 6.95 - samples/sec: 2287.40 - lr: 0.000010 - momentum: 0.000000 2023-10-24 10:12:29,512 epoch 8 - iter 146/738 - loss 0.01236948 - time (sec): 14.04 - samples/sec: 2336.90 - lr: 0.000009 - momentum: 0.000000 2023-10-24 10:12:37,062 epoch 8 - iter 219/738 - loss 0.01044188 - time (sec): 21.59 - samples/sec: 2309.04 - lr: 0.000009 - momentum: 0.000000 2023-10-24 10:12:43,982 epoch 8 - iter 292/738 - loss 0.01034487 - time (sec): 28.51 - samples/sec: 2346.41 - lr: 0.000009 - momentum: 0.000000 2023-10-24 10:12:51,543 epoch 8 - iter 365/738 - loss 0.01126341 - time (sec): 36.07 - samples/sec: 2368.99 - lr: 0.000008 - momentum: 0.000000 2023-10-24 10:12:58,262 epoch 8 - iter 438/738 - loss 0.01076147 - time (sec): 42.79 - samples/sec: 2353.67 - lr: 0.000008 - momentum: 0.000000 2023-10-24 10:13:05,389 epoch 8 - iter 511/738 - loss 0.01075803 - time (sec): 49.91 - samples/sec: 2354.08 - lr: 0.000008 - momentum: 0.000000 2023-10-24 10:13:12,279 epoch 8 - iter 584/738 - loss 0.01183127 - time (sec): 56.80 - samples/sec: 2342.58 - lr: 0.000007 - momentum: 0.000000 2023-10-24 10:13:19,173 epoch 8 - iter 657/738 - loss 0.01204421 - time (sec): 63.70 - samples/sec: 2347.40 - lr: 0.000007 - momentum: 0.000000 2023-10-24 10:13:25,559 epoch 8 - iter 730/738 - loss 0.01168359 - time (sec): 70.08 - samples/sec: 2351.74 - lr: 0.000007 - momentum: 0.000000 2023-10-24 10:13:26,318 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:13:26,318 EPOCH 8 done: loss 0.0116 - lr: 0.000007 2023-10-24 10:13:34,847 DEV : loss 0.19606834650039673 - f1-score (micro avg) 0.8411 2023-10-24 10:13:34,869 saving best model 2023-10-24 10:13:35,564 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:13:42,230 epoch 9 - iter 73/738 - loss 0.00325841 - time (sec): 6.67 - samples/sec: 2350.12 - lr: 0.000006 - momentum: 0.000000 2023-10-24 10:13:49,266 epoch 9 - iter 146/738 - loss 0.00560515 - time (sec): 13.70 - samples/sec: 2327.56 - lr: 0.000006 - momentum: 0.000000 2023-10-24 10:13:55,847 epoch 9 - iter 219/738 - loss 0.00871536 - time (sec): 20.28 - samples/sec: 2341.82 - lr: 0.000006 - momentum: 0.000000 2023-10-24 10:14:03,005 epoch 9 - iter 292/738 - loss 0.00745041 - time (sec): 27.44 - samples/sec: 2358.00 - lr: 0.000005 - momentum: 0.000000 2023-10-24 10:14:10,081 epoch 9 - iter 365/738 - loss 0.00721985 - time (sec): 34.52 - samples/sec: 2337.43 - lr: 0.000005 - momentum: 0.000000 2023-10-24 10:14:16,553 epoch 9 - iter 438/738 - loss 0.00827858 - time (sec): 40.99 - samples/sec: 2344.81 - lr: 0.000005 - momentum: 0.000000 2023-10-24 10:14:22,960 epoch 9 - iter 511/738 - loss 0.00832065 - time (sec): 47.39 - samples/sec: 2343.01 - lr: 0.000004 - momentum: 0.000000 2023-10-24 10:14:30,542 epoch 9 - iter 584/738 - loss 0.00748758 - time (sec): 54.98 - samples/sec: 2349.55 - lr: 0.000004 - momentum: 0.000000 2023-10-24 10:14:37,973 epoch 9 - iter 657/738 - loss 0.00872752 - time (sec): 62.41 - samples/sec: 2360.31 - lr: 0.000004 - momentum: 0.000000 2023-10-24 10:14:45,591 epoch 9 - iter 730/738 - loss 0.00832303 - time (sec): 70.03 - samples/sec: 2355.51 - lr: 0.000003 - momentum: 0.000000 2023-10-24 10:14:46,236 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:14:46,236 EPOCH 9 done: loss 0.0083 - lr: 0.000003 2023-10-24 10:14:54,743 DEV : loss 0.21085196733474731 - f1-score (micro avg) 0.8465 2023-10-24 10:14:54,765 saving best model 2023-10-24 10:14:55,461 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:15:03,114 epoch 10 - iter 73/738 - loss 0.00225549 - time (sec): 7.65 - samples/sec: 2254.22 - lr: 0.000003 - momentum: 0.000000 2023-10-24 10:15:09,825 epoch 10 - iter 146/738 - loss 0.00248018 - time (sec): 14.36 - samples/sec: 2313.87 - lr: 0.000003 - momentum: 0.000000 2023-10-24 10:15:16,491 epoch 10 - iter 219/738 - loss 0.00258571 - time (sec): 21.03 - samples/sec: 2308.26 - lr: 0.000002 - momentum: 0.000000 2023-10-24 10:15:23,656 epoch 10 - iter 292/738 - loss 0.00272149 - time (sec): 28.19 - samples/sec: 2317.62 - lr: 0.000002 - momentum: 0.000000 2023-10-24 10:15:31,083 epoch 10 - iter 365/738 - loss 0.00322358 - time (sec): 35.62 - samples/sec: 2357.62 - lr: 0.000002 - momentum: 0.000000 2023-10-24 10:15:38,019 epoch 10 - iter 438/738 - loss 0.00349996 - time (sec): 42.56 - samples/sec: 2352.67 - lr: 0.000001 - momentum: 0.000000 2023-10-24 10:15:45,376 epoch 10 - iter 511/738 - loss 0.00398963 - time (sec): 49.91 - samples/sec: 2355.98 - lr: 0.000001 - momentum: 0.000000 2023-10-24 10:15:52,559 epoch 10 - iter 584/738 - loss 0.00460257 - time (sec): 57.10 - samples/sec: 2352.17 - lr: 0.000001 - momentum: 0.000000 2023-10-24 10:15:58,908 epoch 10 - iter 657/738 - loss 0.00450873 - time (sec): 63.45 - samples/sec: 2352.91 - lr: 0.000000 - momentum: 0.000000 2023-10-24 10:16:05,753 epoch 10 - iter 730/738 - loss 0.00478145 - time (sec): 70.29 - samples/sec: 2345.44 - lr: 0.000000 - momentum: 0.000000 2023-10-24 10:16:06,439 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:16:06,440 EPOCH 10 done: loss 0.0048 - lr: 0.000000 2023-10-24 10:16:14,959 DEV : loss 0.21770432591438293 - f1-score (micro avg) 0.8466 2023-10-24 10:16:14,981 saving best model 2023-10-24 10:16:16,242 ---------------------------------------------------------------------------------------------------- 2023-10-24 10:16:16,243 Loading model from best epoch ... 2023-10-24 10:16:18,060 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-time, B-time, E-time, I-time, S-prod, B-prod, E-prod, I-prod 2023-10-24 10:16:24,729 Results: - F-score (micro) 0.7923 - F-score (macro) 0.7091 - Accuracy 0.6784 By class: precision recall f1-score support loc 0.8467 0.8753 0.8607 858 pers 0.7404 0.7914 0.7651 537 org 0.5532 0.5909 0.5714 132 time 0.5217 0.6667 0.5854 54 prod 0.7895 0.7377 0.7627 61 micro avg 0.7726 0.8130 0.7923 1642 macro avg 0.6903 0.7324 0.7091 1642 weighted avg 0.7755 0.8130 0.7935 1642 2023-10-24 10:16:24,729 ----------------------------------------------------------------------------------------------------