File size: 172,174 Bytes
f8ce820
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
+ deepspeed --num_nodes=1 --num_gpus=4 --master_port 47607 --module safe_rlhf.values.reward --train_datasets PKU-SafeRLHF/train:1.0:PKU-SafeRLHF-harmless-only-30k --eval_datasets PKU-SafeRLHF/test --model_name_or_path output/sft --max_length 512 --trust_remote_code True --loss_type sequence-wise --epochs 2 --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --gradient_accumulation_steps 2 --gradient_checkpointing --normalize_score_during_training False --normalizer_type ExponentialMovingAverage --normalizer_momentum 0.9 --learning_rate 2e-5 --lr_scheduler_type cosine --lr_warmup_ratio 0.03 --weight_decay 0.1 --seed 42 --eval_strategy epoch --output_dir /data/jiongxiao_wang/rlhf_attack/safe-rlhf/output/rm_30k --log_type wandb --log_project Safe-RLHF-RM --zero_stage 3 --bf16 True --tf32 True
2024-01-05 20:02:46.835068: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-05 20:02:46.835067: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-05 20:02:46.835067: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-05 20:02:46.835114: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-05 20:02:46.835114: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-05 20:02:46.835114: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-05 20:02:46.835826: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-05 20:02:46.835865: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-05 20:02:46.836421: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-05 20:02:46.836422: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-05 20:02:46.836424: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-05 20:02:46.836771: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-05 20:02:48.497891: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-05 20:02:48.498124: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-05 20:02:48.498360: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-05 20:02:48.498588: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Some weights of the model checkpoint at output/sft were not used when initializing LlamaModelForScore: ['lm_head.weight']
- This IS expected if you are initializing LlamaModelForScore from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaModelForScore from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at output/sft were not used when initializing LlamaModelForScore: ['lm_head.weight']
- This IS expected if you are initializing LlamaModelForScore from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaModelForScore from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at output/sft were not used when initializing LlamaModelForScore: ['lm_head.weight']
- This IS expected if you are initializing LlamaModelForScore from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaModelForScore from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LlamaModelForScore were not initialized from the model checkpoint at output/sft and are newly initialized: ['normalizer.count', 'normalizer.mean', 'normalizer.var']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of LlamaModelForScore were not initialized from the model checkpoint at output/sft and are newly initialized: ['normalizer.count', 'normalizer.var', 'normalizer.mean']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of LlamaModelForScore were not initialized from the model checkpoint at output/sft and are newly initialized: ['normalizer.mean', 'normalizer.var', 'normalizer.count']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Some weights of the model checkpoint at output/sft were not used when initializing LlamaModelForScore: ['lm_head.weight']
- This IS expected if you are initializing LlamaModelForScore from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaModelForScore from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LlamaModelForScore were not initialized from the model checkpoint at output/sft and are newly initialized: ['normalizer.var', 'normalizer.mean', 'normalizer.count']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Using /data/jiongxiao_wang/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /data/jiongxiao_wang/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /data/jiongxiao_wang/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /data/jiongxiao_wang/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /data/jiongxiao_wang/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
wandb: Currently logged in as: jayfeather (jayfeather1024). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.16.1
wandb: Run data is saved locally in /data/jiongxiao_wang/rlhf_attack/safe-rlhf/output/rm_30k/wandb/run-20240105_200327-0bh9htd8
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run reward-2024-01-05-20-03-25
wandb: ⭐️ View project at https://wandb.ai/jayfeather1024/Safe-RLHF-RM
wandb: πŸš€ View run at https://wandb.ai/jayfeather1024/Safe-RLHF-RM/runs/0bh9htd8

Training 1/2 epoch:   0%|          | 0/840 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...

Training 1/2 epoch (loss 0.6953):   0%|          | 0/840 [00:05<?, ?it/s]
Training 1/2 epoch (loss 0.6953):   0%|          | 1/840 [00:05<1:13:24,  5.25s/it]
Training 1/2 epoch (loss 0.6914):   0%|          | 1/840 [00:08<1:13:24,  5.25s/it]
Training 1/2 epoch (loss 0.6914):   0%|          | 2/840 [00:08<1:00:36,  4.34s/it]
Training 1/2 epoch (loss 0.6953):   0%|          | 2/840 [00:11<1:00:36,  4.34s/it]
Training 1/2 epoch (loss 0.6953):   0%|          | 3/840 [00:11<51:24,  3.69s/it]  
Training 1/2 epoch (loss 0.6914):   0%|          | 3/840 [00:14<51:24,  3.69s/it]
Training 1/2 epoch (loss 0.6914):   0%|          | 4/840 [00:14<47:14,  3.39s/it]
Training 1/2 epoch (loss 0.6953):   0%|          | 4/840 [00:19<47:14,  3.39s/it]
Training 1/2 epoch (loss 0.6953):   1%|          | 5/840 [00:19<52:19,  3.76s/it]
Training 1/2 epoch (loss 0.6953):   1%|          | 5/840 [00:22<52:19,  3.76s/it]
Training 1/2 epoch (loss 0.6953):   1%|          | 6/840 [00:22<50:49,  3.66s/it]
Training 1/2 epoch (loss 0.6953):   1%|          | 6/840 [00:26<50:49,  3.66s/it]
Training 1/2 epoch (loss 0.6953):   1%|          | 7/840 [00:26<51:38,  3.72s/it]
Training 1/2 epoch (loss 0.6953):   1%|          | 7/840 [00:32<51:38,  3.72s/it]
Training 1/2 epoch (loss 0.6953):   1%|          | 8/840 [00:32<59:29,  4.29s/it]
Training 1/2 epoch (loss 0.6914):   1%|          | 8/840 [00:35<59:29,  4.29s/it]
Training 1/2 epoch (loss 0.6914):   1%|          | 9/840 [00:35<57:58,  4.19s/it]
Training 1/2 epoch (loss 0.6914):   1%|          | 9/840 [00:39<57:58,  4.19s/it]
Training 1/2 epoch (loss 0.6914):   1%|          | 10/840 [00:39<53:10,  3.84s/it]
Training 1/2 epoch (loss 0.6953):   1%|          | 10/840 [00:41<53:10,  3.84s/it]
Training 1/2 epoch (loss 0.6953):   1%|▏         | 11/840 [00:41<47:33,  3.44s/it]
Training 1/2 epoch (loss 0.6953):   1%|▏         | 11/840 [00:45<47:33,  3.44s/it]
Training 1/2 epoch (loss 0.6953):   1%|▏         | 12/840 [00:45<50:24,  3.65s/it]
Training 1/2 epoch (loss 0.6953):   1%|▏         | 12/840 [00:48<50:24,  3.65s/it]
Training 1/2 epoch (loss 0.6953):   2%|▏         | 13/840 [00:48<45:59,  3.34s/it]
Training 1/2 epoch (loss 0.6953):   2%|▏         | 13/840 [00:51<45:59,  3.34s/it]
Training 1/2 epoch (loss 0.6953):   2%|▏         | 14/840 [00:51<46:46,  3.40s/it]
Training 1/2 epoch (loss 0.6914):   2%|▏         | 14/840 [00:54<46:46,  3.40s/it]
Training 1/2 epoch (loss 0.6914):   2%|▏         | 15/840 [00:54<44:37,  3.25s/it]
Training 1/2 epoch (loss 0.6875):   2%|▏         | 15/840 [00:58<44:37,  3.25s/it]
Training 1/2 epoch (loss 0.6875):   2%|▏         | 16/840 [00:58<45:35,  3.32s/it]
Training 1/2 epoch (loss 0.6953):   2%|▏         | 16/840 [01:01<45:35,  3.32s/it]
Training 1/2 epoch (loss 0.6953):   2%|▏         | 17/840 [01:01<43:42,  3.19s/it]
Training 1/2 epoch (loss 0.6914):   2%|▏         | 17/840 [01:04<43:42,  3.19s/it]
Training 1/2 epoch (loss 0.6914):   2%|▏         | 18/840 [01:04<44:04,  3.22s/it]
Training 1/2 epoch (loss 0.6953):   2%|▏         | 18/840 [01:08<44:04,  3.22s/it]
Training 1/2 epoch (loss 0.6953):   2%|▏         | 19/840 [01:08<48:10,  3.52s/it]
Training 1/2 epoch (loss 0.6875):   2%|▏         | 19/840 [01:12<48:10,  3.52s/it]
Training 1/2 epoch (loss 0.6875):   2%|▏         | 20/840 [01:12<49:31,  3.62s/it]
Training 1/2 epoch (loss 0.6953):   2%|▏         | 20/840 [01:16<49:31,  3.62s/it]
Training 1/2 epoch (loss 0.6953):   2%|β–Ž         | 21/840 [01:16<49:34,  3.63s/it]
Training 1/2 epoch (loss 0.6914):   2%|β–Ž         | 21/840 [01:18<49:34,  3.63s/it]
Training 1/2 epoch (loss 0.6914):   3%|β–Ž         | 22/840 [01:18<45:56,  3.37s/it]
Training 1/2 epoch (loss 0.6953):   3%|β–Ž         | 22/840 [01:21<45:56,  3.37s/it]
Training 1/2 epoch (loss 0.6953):   3%|β–Ž         | 23/840 [01:21<43:13,  3.17s/it]
Training 1/2 epoch (loss 0.6875):   3%|β–Ž         | 23/840 [01:25<43:13,  3.17s/it]
Training 1/2 epoch (loss 0.6875):   3%|β–Ž         | 24/840 [01:25<45:43,  3.36s/it]
Training 1/2 epoch (loss 0.6797):   3%|β–Ž         | 24/840 [01:30<45:43,  3.36s/it]
Training 1/2 epoch (loss 0.6797):   3%|β–Ž         | 25/840 [01:30<51:21,  3.78s/it]
Training 1/2 epoch (loss 0.6836):   3%|β–Ž         | 25/840 [01:35<51:21,  3.78s/it]
Training 1/2 epoch (loss 0.6836):   3%|β–Ž         | 26/840 [01:35<56:20,  4.15s/it]
Training 1/2 epoch (loss 0.6641):   3%|β–Ž         | 26/840 [01:38<56:20,  4.15s/it]
Training 1/2 epoch (loss 0.6641):   3%|β–Ž         | 27/840 [01:38<53:18,  3.93s/it]
Training 1/2 epoch (loss 0.6562):   3%|β–Ž         | 27/840 [01:42<53:18,  3.93s/it]
Training 1/2 epoch (loss 0.6562):   3%|β–Ž         | 28/840 [01:42<50:55,  3.76s/it]
Training 1/2 epoch (loss 0.6602):   3%|β–Ž         | 28/840 [01:44<50:55,  3.76s/it]
Training 1/2 epoch (loss 0.6602):   3%|β–Ž         | 29/840 [01:44<45:44,  3.38s/it]
Training 1/2 epoch (loss 0.6562):   3%|β–Ž         | 29/840 [01:49<45:44,  3.38s/it]
Training 1/2 epoch (loss 0.6562):   4%|β–Ž         | 30/840 [01:49<52:16,  3.87s/it]
Training 1/2 epoch (loss 0.7031):   4%|β–Ž         | 30/840 [01:54<52:16,  3.87s/it]
Training 1/2 epoch (loss 0.7031):   4%|β–Ž         | 31/840 [01:54<55:12,  4.09s/it]
Training 1/2 epoch (loss 0.6602):   4%|β–Ž         | 31/840 [01:57<55:12,  4.09s/it]
Training 1/2 epoch (loss 0.6602):   4%|▍         | 32/840 [01:57<52:52,  3.93s/it]
Training 1/2 epoch (loss 0.5938):   4%|▍         | 32/840 [02:03<52:52,  3.93s/it]
Training 1/2 epoch (loss 0.5938):   4%|▍         | 33/840 [02:03<58:43,  4.37s/it]
Training 1/2 epoch (loss 0.6289):   4%|▍         | 33/840 [02:06<58:43,  4.37s/it]
Training 1/2 epoch (loss 0.6289):   4%|▍         | 34/840 [02:06<55:01,  4.10s/it]
Training 1/2 epoch (loss 0.6055):   4%|▍         | 34/840 [02:11<55:01,  4.10s/it]
Training 1/2 epoch (loss 0.6055):   4%|▍         | 35/840 [02:11<56:57,  4.25s/it]
Training 1/2 epoch (loss 0.5391):   4%|▍         | 35/840 [02:14<56:57,  4.25s/it]
Training 1/2 epoch (loss 0.5391):   4%|▍         | 36/840 [02:14<54:14,  4.05s/it]
Training 1/2 epoch (loss 0.6562):   4%|▍         | 36/840 [02:17<54:14,  4.05s/it]
Training 1/2 epoch (loss 0.6562):   4%|▍         | 37/840 [02:17<50:34,  3.78s/it]
Training 1/2 epoch (loss 0.5430):   4%|▍         | 37/840 [02:21<50:34,  3.78s/it]
Training 1/2 epoch (loss 0.5430):   5%|▍         | 38/840 [02:21<48:03,  3.60s/it]
Training 1/2 epoch (loss 0.6016):   5%|▍         | 38/840 [02:25<48:03,  3.60s/it]
Training 1/2 epoch (loss 0.6016):   5%|▍         | 39/840 [02:25<52:33,  3.94s/it]
Training 1/2 epoch (loss 0.6094):   5%|▍         | 39/840 [02:30<52:33,  3.94s/it]
Training 1/2 epoch (loss 0.6094):   5%|▍         | 40/840 [02:30<57:21,  4.30s/it]
Training 1/2 epoch (loss 0.6172):   5%|▍         | 40/840 [02:35<57:21,  4.30s/it]
Training 1/2 epoch (loss 0.6172):   5%|▍         | 41/840 [02:35<57:42,  4.33s/it]
Training 1/2 epoch (loss 0.5938):   5%|▍         | 41/840 [02:38<57:42,  4.33s/it]
Training 1/2 epoch (loss 0.5938):   5%|β–Œ         | 42/840 [02:38<54:38,  4.11s/it]
Training 1/2 epoch (loss 0.6172):   5%|β–Œ         | 42/840 [02:43<54:38,  4.11s/it]
Training 1/2 epoch (loss 0.6172):   5%|β–Œ         | 43/840 [02:43<55:57,  4.21s/it]
Training 1/2 epoch (loss 0.6797):   5%|β–Œ         | 43/840 [02:47<55:57,  4.21s/it]
Training 1/2 epoch (loss 0.6797):   5%|β–Œ         | 44/840 [02:47<55:39,  4.20s/it]
Training 1/2 epoch (loss 0.5820):   5%|β–Œ         | 44/840 [02:50<55:39,  4.20s/it]
Training 1/2 epoch (loss 0.5820):   5%|β–Œ         | 45/840 [02:50<49:10,  3.71s/it]
Training 1/2 epoch (loss 0.5469):   5%|β–Œ         | 45/840 [02:55<49:10,  3.71s/it]
Training 1/2 epoch (loss 0.5469):   5%|β–Œ         | 46/840 [02:55<56:44,  4.29s/it]
Training 1/2 epoch (loss 0.5859):   5%|β–Œ         | 46/840 [02:59<56:44,  4.29s/it]
Training 1/2 epoch (loss 0.5859):   6%|β–Œ         | 47/840 [02:59<54:20,  4.11s/it]
Training 1/2 epoch (loss 0.5859):   6%|β–Œ         | 47/840 [03:02<54:20,  4.11s/it]
Training 1/2 epoch (loss 0.5859):   6%|β–Œ         | 48/840 [03:02<50:15,  3.81s/it]
Training 1/2 epoch (loss 0.7266):   6%|β–Œ         | 48/840 [03:05<50:15,  3.81s/it]
Training 1/2 epoch (loss 0.7266):   6%|β–Œ         | 49/840 [03:05<45:19,  3.44s/it]
Training 1/2 epoch (loss 0.6914):   6%|β–Œ         | 49/840 [03:07<45:19,  3.44s/it]
Training 1/2 epoch (loss 0.6914):   6%|β–Œ         | 50/840 [03:07<42:28,  3.23s/it]
Training 1/2 epoch (loss 0.6016):   6%|β–Œ         | 50/840 [03:11<42:28,  3.23s/it]
Training 1/2 epoch (loss 0.6016):   6%|β–Œ         | 51/840 [03:11<46:00,  3.50s/it]
Training 1/2 epoch (loss 0.6250):   6%|β–Œ         | 51/840 [03:15<46:00,  3.50s/it]
Training 1/2 epoch (loss 0.6250):   6%|β–Œ         | 52/840 [03:15<44:52,  3.42s/it]
Training 1/2 epoch (loss 0.6328):   6%|β–Œ         | 52/840 [03:18<44:52,  3.42s/it]
Training 1/2 epoch (loss 0.6328):   6%|β–‹         | 53/840 [03:18<42:53,  3.27s/it]
Training 1/2 epoch (loss 0.5938):   6%|β–‹         | 53/840 [03:21<42:53,  3.27s/it]
Training 1/2 epoch (loss 0.5938):   6%|β–‹         | 54/840 [03:21<42:21,  3.23s/it]
Training 1/2 epoch (loss 0.6289):   6%|β–‹         | 54/840 [03:24<42:21,  3.23s/it]
Training 1/2 epoch (loss 0.6289):   7%|β–‹         | 55/840 [03:24<43:03,  3.29s/it]
Training 1/2 epoch (loss 0.5859):   7%|β–‹         | 55/840 [03:27<43:03,  3.29s/it]
Training 1/2 epoch (loss 0.5859):   7%|β–‹         | 56/840 [03:27<42:25,  3.25s/it]
Training 1/2 epoch (loss 0.6719):   7%|β–‹         | 56/840 [03:30<42:25,  3.25s/it]
Training 1/2 epoch (loss 0.6719):   7%|β–‹         | 57/840 [03:30<41:27,  3.18s/it]
Training 1/2 epoch (loss 0.5859):   7%|β–‹         | 57/840 [03:34<41:27,  3.18s/it]
Training 1/2 epoch (loss 0.5859):   7%|β–‹         | 58/840 [03:34<42:55,  3.29s/it]
Training 1/2 epoch (loss 0.6406):   7%|β–‹         | 58/840 [03:39<42:55,  3.29s/it]
Training 1/2 epoch (loss 0.6406):   7%|β–‹         | 59/840 [03:39<48:06,  3.70s/it]
Training 1/2 epoch (loss 0.5312):   7%|β–‹         | 59/840 [03:44<48:06,  3.70s/it]
Training 1/2 epoch (loss 0.5312):   7%|β–‹         | 60/840 [03:44<54:55,  4.22s/it]
Training 1/2 epoch (loss 0.5547):   7%|β–‹         | 60/840 [03:47<54:55,  4.22s/it]
Training 1/2 epoch (loss 0.5547):   7%|β–‹         | 61/840 [03:47<50:47,  3.91s/it]
Training 1/2 epoch (loss 0.6914):   7%|β–‹         | 61/840 [03:51<50:47,  3.91s/it]
Training 1/2 epoch (loss 0.6914):   7%|β–‹         | 62/840 [03:51<50:41,  3.91s/it]
Training 1/2 epoch (loss 0.6484):   7%|β–‹         | 62/840 [03:57<50:41,  3.91s/it]
Training 1/2 epoch (loss 0.6484):   8%|β–Š         | 63/840 [03:57<56:32,  4.37s/it]
Training 1/2 epoch (loss 0.7578):   8%|β–Š         | 63/840 [04:01<56:32,  4.37s/it]
Training 1/2 epoch (loss 0.7578):   8%|β–Š         | 64/840 [04:01<57:20,  4.43s/it]
Training 1/2 epoch (loss 0.5820):   8%|β–Š         | 64/840 [04:04<57:20,  4.43s/it]
Training 1/2 epoch (loss 0.5820):   8%|β–Š         | 65/840 [04:04<51:12,  3.97s/it]
Training 1/2 epoch (loss 0.5977):   8%|β–Š         | 65/840 [04:07<51:12,  3.97s/it]
Training 1/2 epoch (loss 0.5977):   8%|β–Š         | 66/840 [04:07<46:42,  3.62s/it]
Training 1/2 epoch (loss 0.6758):   8%|β–Š         | 66/840 [04:11<46:42,  3.62s/it]
Training 1/2 epoch (loss 0.6758):   8%|β–Š         | 67/840 [04:11<48:13,  3.74s/it]
Training 1/2 epoch (loss 0.5938):   8%|β–Š         | 67/840 [04:15<48:13,  3.74s/it]
Training 1/2 epoch (loss 0.5938):   8%|β–Š         | 68/840 [04:15<50:21,  3.91s/it]
Training 1/2 epoch (loss 0.6719):   8%|β–Š         | 68/840 [04:18<50:21,  3.91s/it]
Training 1/2 epoch (loss 0.6719):   8%|β–Š         | 69/840 [04:18<45:26,  3.54s/it]
Training 1/2 epoch (loss 0.5508):   8%|β–Š         | 69/840 [04:22<45:26,  3.54s/it]
Training 1/2 epoch (loss 0.5508):   8%|β–Š         | 70/840 [04:22<47:49,  3.73s/it]
Training 1/2 epoch (loss 0.5820):   8%|β–Š         | 70/840 [04:25<47:49,  3.73s/it]
Training 1/2 epoch (loss 0.5820):   8%|β–Š         | 71/840 [04:25<46:18,  3.61s/it]
Training 1/2 epoch (loss 0.6055):   8%|β–Š         | 71/840 [04:31<46:18,  3.61s/it]
Training 1/2 epoch (loss 0.6055):   9%|β–Š         | 72/840 [04:31<53:57,  4.21s/it]
Training 1/2 epoch (loss 0.5938):   9%|β–Š         | 72/840 [04:36<53:57,  4.21s/it]
Training 1/2 epoch (loss 0.5938):   9%|β–Š         | 73/840 [04:36<58:31,  4.58s/it]
Training 1/2 epoch (loss 0.6133):   9%|β–Š         | 73/840 [04:41<58:31,  4.58s/it]
Training 1/2 epoch (loss 0.6133):   9%|β–‰         | 74/840 [04:41<58:52,  4.61s/it]
Training 1/2 epoch (loss 0.5938):   9%|β–‰         | 74/840 [04:47<58:52,  4.61s/it]
Training 1/2 epoch (loss 0.5938):   9%|β–‰         | 75/840 [04:47<1:02:10,  4.88s/it]
Training 1/2 epoch (loss 0.6094):   9%|β–‰         | 75/840 [04:49<1:02:10,  4.88s/it]
Training 1/2 epoch (loss 0.6094):   9%|β–‰         | 76/840 [04:49<53:39,  4.21s/it]  
Training 1/2 epoch (loss 0.6172):   9%|β–‰         | 76/840 [04:52<53:39,  4.21s/it]
Training 1/2 epoch (loss 0.6172):   9%|β–‰         | 77/840 [04:52<49:35,  3.90s/it]
Training 1/2 epoch (loss 0.6094):   9%|β–‰         | 77/840 [04:55<49:35,  3.90s/it]
Training 1/2 epoch (loss 0.6094):   9%|β–‰         | 78/840 [04:55<45:17,  3.57s/it]
Training 1/2 epoch (loss 0.5664):   9%|β–‰         | 78/840 [05:00<45:17,  3.57s/it]
Training 1/2 epoch (loss 0.5664):   9%|β–‰         | 79/840 [05:00<48:35,  3.83s/it]
Training 1/2 epoch (loss 0.6523):   9%|β–‰         | 79/840 [05:03<48:35,  3.83s/it]
Training 1/2 epoch (loss 0.6523):  10%|β–‰         | 80/840 [05:03<45:59,  3.63s/it]
Training 1/2 epoch (loss 0.5625):  10%|β–‰         | 80/840 [05:06<45:59,  3.63s/it]
Training 1/2 epoch (loss 0.5625):  10%|β–‰         | 81/840 [05:06<44:57,  3.55s/it]
Training 1/2 epoch (loss 0.6328):  10%|β–‰         | 81/840 [05:09<44:57,  3.55s/it]
Training 1/2 epoch (loss 0.6328):  10%|β–‰         | 82/840 [05:09<44:02,  3.49s/it]
Training 1/2 epoch (loss 0.5781):  10%|β–‰         | 82/840 [05:14<44:02,  3.49s/it]
Training 1/2 epoch (loss 0.5781):  10%|β–‰         | 83/840 [05:14<47:30,  3.77s/it]
Training 1/2 epoch (loss 0.5625):  10%|β–‰         | 83/840 [05:18<47:30,  3.77s/it]
Training 1/2 epoch (loss 0.5625):  10%|β–ˆ         | 84/840 [05:18<47:50,  3.80s/it]
Training 1/2 epoch (loss 0.6211):  10%|β–ˆ         | 84/840 [05:21<47:50,  3.80s/it]
Training 1/2 epoch (loss 0.6211):  10%|β–ˆ         | 85/840 [05:21<44:31,  3.54s/it]
Training 1/2 epoch (loss 0.6953):  10%|β–ˆ         | 85/840 [05:24<44:31,  3.54s/it]
Training 1/2 epoch (loss 0.6953):  10%|β–ˆ         | 86/840 [05:24<41:40,  3.32s/it]
Training 1/2 epoch (loss 0.5977):  10%|β–ˆ         | 86/840 [05:29<41:40,  3.32s/it]
Training 1/2 epoch (loss 0.5977):  10%|β–ˆ         | 87/840 [05:29<49:26,  3.94s/it]
Training 1/2 epoch (loss 0.6562):  10%|β–ˆ         | 87/840 [05:33<49:26,  3.94s/it]
Training 1/2 epoch (loss 0.6562):  10%|β–ˆ         | 88/840 [05:33<50:02,  3.99s/it]
Training 1/2 epoch (loss 0.6250):  10%|β–ˆ         | 88/840 [05:37<50:02,  3.99s/it]
Training 1/2 epoch (loss 0.6250):  11%|β–ˆ         | 89/840 [05:37<48:02,  3.84s/it]
Training 1/2 epoch (loss 0.5547):  11%|β–ˆ         | 89/840 [05:40<48:02,  3.84s/it]
Training 1/2 epoch (loss 0.5547):  11%|β–ˆ         | 90/840 [05:40<46:38,  3.73s/it]
Training 1/2 epoch (loss 0.6836):  11%|β–ˆ         | 90/840 [05:43<46:38,  3.73s/it]
Training 1/2 epoch (loss 0.6836):  11%|β–ˆ         | 91/840 [05:43<42:55,  3.44s/it]
Training 1/2 epoch (loss 0.5664):  11%|β–ˆ         | 91/840 [05:46<42:55,  3.44s/it]
Training 1/2 epoch (loss 0.5664):  11%|β–ˆ         | 92/840 [05:46<41:53,  3.36s/it]
Training 1/2 epoch (loss 0.6250):  11%|β–ˆ         | 92/840 [05:49<41:53,  3.36s/it]
Training 1/2 epoch (loss 0.6250):  11%|β–ˆ         | 93/840 [05:49<39:38,  3.18s/it]
Training 1/2 epoch (loss 0.5781):  11%|β–ˆ         | 93/840 [05:52<39:38,  3.18s/it]
Training 1/2 epoch (loss 0.5781):  11%|β–ˆ         | 94/840 [05:52<39:49,  3.20s/it]
Training 1/2 epoch (loss 0.5977):  11%|β–ˆ         | 94/840 [05:56<39:49,  3.20s/it]
Training 1/2 epoch (loss 0.5977):  11%|β–ˆβ–        | 95/840 [05:56<42:01,  3.38s/it]
Training 1/2 epoch (loss 0.5664):  11%|β–ˆβ–        | 95/840 [06:01<42:01,  3.38s/it]
Training 1/2 epoch (loss 0.5664):  11%|β–ˆβ–        | 96/840 [06:01<50:17,  4.06s/it]
Training 1/2 epoch (loss 0.5820):  11%|β–ˆβ–        | 96/840 [06:05<50:17,  4.06s/it]
Training 1/2 epoch (loss 0.5820):  12%|β–ˆβ–        | 97/840 [06:05<50:09,  4.05s/it]
Training 1/2 epoch (loss 0.5586):  12%|β–ˆβ–        | 97/840 [06:10<50:09,  4.05s/it]
Training 1/2 epoch (loss 0.5586):  12%|β–ˆβ–        | 98/840 [06:10<50:33,  4.09s/it]
Training 1/2 epoch (loss 0.5547):  12%|β–ˆβ–        | 98/840 [06:12<50:33,  4.09s/it]
Training 1/2 epoch (loss 0.5547):  12%|β–ˆβ–        | 99/840 [06:12<46:02,  3.73s/it]
Training 1/2 epoch (loss 0.5352):  12%|β–ˆβ–        | 99/840 [06:16<46:02,  3.73s/it]
Training 1/2 epoch (loss 0.5352):  12%|β–ˆβ–        | 100/840 [06:16<43:30,  3.53s/it]
Training 1/2 epoch (loss 0.5312):  12%|β–ˆβ–        | 100/840 [06:21<43:30,  3.53s/it]
Training 1/2 epoch (loss 0.5312):  12%|β–ˆβ–        | 101/840 [06:21<50:42,  4.12s/it]
Training 1/2 epoch (loss 0.6172):  12%|β–ˆβ–        | 101/840 [06:26<50:42,  4.12s/it]
Training 1/2 epoch (loss 0.6172):  12%|β–ˆβ–        | 102/840 [06:26<52:12,  4.24s/it]
Training 1/2 epoch (loss 0.6094):  12%|β–ˆβ–        | 102/840 [06:29<52:12,  4.24s/it]
Training 1/2 epoch (loss 0.6094):  12%|β–ˆβ–        | 103/840 [06:29<47:50,  3.90s/it]
Training 1/2 epoch (loss 0.6445):  12%|β–ˆβ–        | 103/840 [06:32<47:50,  3.90s/it]
Training 1/2 epoch (loss 0.6445):  12%|β–ˆβ–        | 104/840 [06:32<46:53,  3.82s/it]
Training 1/2 epoch (loss 0.5781):  12%|β–ˆβ–        | 104/840 [06:36<46:53,  3.82s/it]
Training 1/2 epoch (loss 0.5781):  12%|β–ˆβ–Ž        | 105/840 [06:36<47:00,  3.84s/it]
Training 1/2 epoch (loss 0.6289):  12%|β–ˆβ–Ž        | 105/840 [06:41<47:00,  3.84s/it]
Training 1/2 epoch (loss 0.6289):  13%|β–ˆβ–Ž        | 106/840 [06:41<49:32,  4.05s/it]
Training 1/2 epoch (loss 0.5508):  13%|β–ˆβ–Ž        | 106/840 [06:46<49:32,  4.05s/it]
Training 1/2 epoch (loss 0.5508):  13%|β–ˆβ–Ž        | 107/840 [06:46<54:25,  4.46s/it]
Training 1/2 epoch (loss 0.5664):  13%|β–ˆβ–Ž        | 107/840 [06:49<54:25,  4.46s/it]
Training 1/2 epoch (loss 0.5664):  13%|β–ˆβ–Ž        | 108/840 [06:49<49:55,  4.09s/it]
Training 1/2 epoch (loss 0.5742):  13%|β–ˆβ–Ž        | 108/840 [06:53<49:55,  4.09s/it]
Training 1/2 epoch (loss 0.5742):  13%|β–ˆβ–Ž        | 109/840 [06:53<46:38,  3.83s/it]
Training 1/2 epoch (loss 0.6094):  13%|β–ˆβ–Ž        | 109/840 [06:58<46:38,  3.83s/it]
Training 1/2 epoch (loss 0.6094):  13%|β–ˆβ–Ž        | 110/840 [06:58<52:31,  4.32s/it]
Training 1/2 epoch (loss 0.6719):  13%|β–ˆβ–Ž        | 110/840 [07:01<52:31,  4.32s/it]
Training 1/2 epoch (loss 0.6719):  13%|β–ˆβ–Ž        | 111/840 [07:01<46:24,  3.82s/it]
Training 1/2 epoch (loss 0.5977):  13%|β–ˆβ–Ž        | 111/840 [07:06<46:24,  3.82s/it]
Training 1/2 epoch (loss 0.5977):  13%|β–ˆβ–Ž        | 112/840 [07:06<52:37,  4.34s/it]
Training 1/2 epoch (loss 0.6133):  13%|β–ˆβ–Ž        | 112/840 [07:12<52:37,  4.34s/it]
Training 1/2 epoch (loss 0.6133):  13%|β–ˆβ–Ž        | 113/840 [07:12<56:29,  4.66s/it]
Training 1/2 epoch (loss 0.6523):  13%|β–ˆβ–Ž        | 113/840 [07:15<56:29,  4.66s/it]
Training 1/2 epoch (loss 0.6523):  14%|β–ˆβ–Ž        | 114/840 [07:15<50:43,  4.19s/it]
Training 1/2 epoch (loss 0.5547):  14%|β–ˆβ–Ž        | 114/840 [07:18<50:43,  4.19s/it]
Training 1/2 epoch (loss 0.5547):  14%|β–ˆβ–Ž        | 115/840 [07:18<49:00,  4.06s/it]
Training 1/2 epoch (loss 0.5469):  14%|β–ˆβ–Ž        | 115/840 [07:22<49:00,  4.06s/it]
Training 1/2 epoch (loss 0.5469):  14%|β–ˆβ–        | 116/840 [07:22<46:56,  3.89s/it]
Training 1/2 epoch (loss 0.5625):  14%|β–ˆβ–        | 116/840 [07:25<46:56,  3.89s/it]
Training 1/2 epoch (loss 0.5625):  14%|β–ˆβ–        | 117/840 [07:25<44:30,  3.69s/it]
Training 1/2 epoch (loss 0.6094):  14%|β–ˆβ–        | 117/840 [07:30<44:30,  3.69s/it]
Training 1/2 epoch (loss 0.6094):  14%|β–ˆβ–        | 118/840 [07:30<46:47,  3.89s/it]
Training 1/2 epoch (loss 0.6797):  14%|β–ˆβ–        | 118/840 [07:34<46:47,  3.89s/it]
Training 1/2 epoch (loss 0.6797):  14%|β–ˆβ–        | 119/840 [07:34<49:41,  4.14s/it]
Training 1/2 epoch (loss 0.6172):  14%|β–ˆβ–        | 119/840 [07:38<49:41,  4.14s/it]
Training 1/2 epoch (loss 0.6172):  14%|β–ˆβ–        | 120/840 [07:38<46:26,  3.87s/it]
Training 1/2 epoch (loss 0.5117):  14%|β–ˆβ–        | 120/840 [07:41<46:26,  3.87s/it]
Training 1/2 epoch (loss 0.5117):  14%|β–ˆβ–        | 121/840 [07:41<46:22,  3.87s/it]
Training 1/2 epoch (loss 0.5859):  14%|β–ˆβ–        | 121/840 [07:45<46:22,  3.87s/it]
Training 1/2 epoch (loss 0.5859):  15%|β–ˆβ–        | 122/840 [07:45<45:58,  3.84s/it]
Training 1/2 epoch (loss 0.6523):  15%|β–ˆβ–        | 122/840 [07:48<45:58,  3.84s/it]
Training 1/2 epoch (loss 0.6523):  15%|β–ˆβ–        | 123/840 [07:48<43:21,  3.63s/it]
Training 1/2 epoch (loss 0.4961):  15%|β–ˆβ–        | 123/840 [07:52<43:21,  3.63s/it]
Training 1/2 epoch (loss 0.4961):  15%|β–ˆβ–        | 124/840 [07:52<45:04,  3.78s/it]
Training 1/2 epoch (loss 0.5547):  15%|β–ˆβ–        | 124/840 [07:55<45:04,  3.78s/it]
Training 1/2 epoch (loss 0.5547):  15%|β–ˆβ–        | 125/840 [07:55<40:59,  3.44s/it]
Training 1/2 epoch (loss 0.5586):  15%|β–ˆβ–        | 125/840 [07:58<40:59,  3.44s/it]
Training 1/2 epoch (loss 0.5586):  15%|β–ˆβ–Œ        | 126/840 [07:58<40:06,  3.37s/it]
Training 1/2 epoch (loss 0.5586):  15%|β–ˆβ–Œ        | 126/840 [08:01<40:06,  3.37s/it]
Training 1/2 epoch (loss 0.5586):  15%|β–ˆβ–Œ        | 127/840 [08:01<39:16,  3.31s/it]
Training 1/2 epoch (loss 0.6484):  15%|β–ˆβ–Œ        | 127/840 [08:07<39:16,  3.31s/it]
Training 1/2 epoch (loss 0.6484):  15%|β–ˆβ–Œ        | 128/840 [08:07<47:01,  3.96s/it]
Training 1/2 epoch (loss 0.5820):  15%|β–ˆβ–Œ        | 128/840 [08:10<47:01,  3.96s/it]
Training 1/2 epoch (loss 0.5820):  15%|β–ˆβ–Œ        | 129/840 [08:10<43:25,  3.66s/it]
Training 1/2 epoch (loss 0.6172):  15%|β–ˆβ–Œ        | 129/840 [08:13<43:25,  3.66s/it]
Training 1/2 epoch (loss 0.6172):  15%|β–ˆβ–Œ        | 130/840 [08:13<41:16,  3.49s/it]
Training 1/2 epoch (loss 0.5703):  15%|β–ˆβ–Œ        | 130/840 [08:16<41:16,  3.49s/it]
Training 1/2 epoch (loss 0.5703):  16%|β–ˆβ–Œ        | 131/840 [08:16<38:30,  3.26s/it]
Training 1/2 epoch (loss 0.6055):  16%|β–ˆβ–Œ        | 131/840 [08:21<38:30,  3.26s/it]
Training 1/2 epoch (loss 0.6055):  16%|β–ˆβ–Œ        | 132/840 [08:21<44:13,  3.75s/it]
Training 1/2 epoch (loss 0.5625):  16%|β–ˆβ–Œ        | 132/840 [08:25<44:13,  3.75s/it]
Training 1/2 epoch (loss 0.5625):  16%|β–ˆβ–Œ        | 133/840 [08:25<45:08,  3.83s/it]
Training 1/2 epoch (loss 0.6797):  16%|β–ˆβ–Œ        | 133/840 [08:29<45:08,  3.83s/it]
Training 1/2 epoch (loss 0.6797):  16%|β–ˆβ–Œ        | 134/840 [08:29<47:18,  4.02s/it]
Training 1/2 epoch (loss 0.6016):  16%|β–ˆβ–Œ        | 134/840 [08:32<47:18,  4.02s/it]
Training 1/2 epoch (loss 0.6016):  16%|β–ˆβ–Œ        | 135/840 [08:32<44:09,  3.76s/it]
Training 1/2 epoch (loss 0.6602):  16%|β–ˆβ–Œ        | 135/840 [08:36<44:09,  3.76s/it]
Training 1/2 epoch (loss 0.6602):  16%|β–ˆβ–Œ        | 136/840 [08:36<44:22,  3.78s/it]
Training 1/2 epoch (loss 0.5430):  16%|β–ˆβ–Œ        | 136/840 [08:39<44:22,  3.78s/it]
Training 1/2 epoch (loss 0.5430):  16%|β–ˆβ–‹        | 137/840 [08:39<40:15,  3.44s/it]
Training 1/2 epoch (loss 0.5508):  16%|β–ˆβ–‹        | 137/840 [08:43<40:15,  3.44s/it]
Training 1/2 epoch (loss 0.5508):  16%|β–ˆβ–‹        | 138/840 [08:43<44:42,  3.82s/it]
Training 1/2 epoch (loss 0.6172):  16%|β–ˆβ–‹        | 138/840 [08:47<44:42,  3.82s/it]
Training 1/2 epoch (loss 0.6172):  17%|β–ˆβ–‹        | 139/840 [08:47<45:27,  3.89s/it]
Training 1/2 epoch (loss 0.5039):  17%|β–ˆβ–‹        | 139/840 [08:51<45:27,  3.89s/it]
Training 1/2 epoch (loss 0.5039):  17%|β–ˆβ–‹        | 140/840 [08:51<43:36,  3.74s/it]
Training 1/2 epoch (loss 0.5391):  17%|β–ˆβ–‹        | 140/840 [08:56<43:36,  3.74s/it]
Training 1/2 epoch (loss 0.5391):  17%|β–ˆβ–‹        | 141/840 [08:56<47:11,  4.05s/it]
Training 1/2 epoch (loss 0.5547):  17%|β–ˆβ–‹        | 141/840 [09:01<47:11,  4.05s/it]
Training 1/2 epoch (loss 0.5547):  17%|β–ˆβ–‹        | 142/840 [09:01<52:18,  4.50s/it]
Training 1/2 epoch (loss 0.5742):  17%|β–ˆβ–‹        | 142/840 [09:05<52:18,  4.50s/it]
Training 1/2 epoch (loss 0.5742):  17%|β–ˆβ–‹        | 143/840 [09:05<49:46,  4.28s/it]
Training 1/2 epoch (loss 0.4531):  17%|β–ˆβ–‹        | 143/840 [09:08<49:46,  4.28s/it]
Training 1/2 epoch (loss 0.4531):  17%|β–ˆβ–‹        | 144/840 [09:08<45:43,  3.94s/it]
Training 1/2 epoch (loss 0.6133):  17%|β–ˆβ–‹        | 144/840 [09:11<45:43,  3.94s/it]
Training 1/2 epoch (loss 0.6133):  17%|β–ˆβ–‹        | 145/840 [09:11<42:21,  3.66s/it]
Training 1/2 epoch (loss 0.5312):  17%|β–ˆβ–‹        | 145/840 [09:16<42:21,  3.66s/it]
Training 1/2 epoch (loss 0.5312):  17%|β–ˆβ–‹        | 146/840 [09:16<47:11,  4.08s/it]
Training 1/2 epoch (loss 0.5742):  17%|β–ˆβ–‹        | 146/840 [09:20<47:11,  4.08s/it]
Training 1/2 epoch (loss 0.5742):  18%|β–ˆβ–Š        | 147/840 [09:20<45:58,  3.98s/it]
Training 1/2 epoch (loss 0.6641):  18%|β–ˆβ–Š        | 147/840 [09:24<45:58,  3.98s/it]
Training 1/2 epoch (loss 0.6641):  18%|β–ˆβ–Š        | 148/840 [09:24<45:39,  3.96s/it]
Training 1/2 epoch (loss 0.5859):  18%|β–ˆβ–Š        | 148/840 [09:28<45:39,  3.96s/it]
Training 1/2 epoch (loss 0.5859):  18%|β–ˆβ–Š        | 149/840 [09:28<45:25,  3.94s/it]
Training 1/2 epoch (loss 0.6172):  18%|β–ˆβ–Š        | 149/840 [09:31<45:25,  3.94s/it]
Training 1/2 epoch (loss 0.6172):  18%|β–ˆβ–Š        | 150/840 [09:31<42:35,  3.70s/it]
Training 1/2 epoch (loss 0.6562):  18%|β–ˆβ–Š        | 150/840 [09:35<42:35,  3.70s/it]
Training 1/2 epoch (loss 0.6562):  18%|β–ˆβ–Š        | 151/840 [09:35<42:29,  3.70s/it]
Training 1/2 epoch (loss 0.5664):  18%|β–ˆβ–Š        | 151/840 [09:38<42:29,  3.70s/it]
Training 1/2 epoch (loss 0.5664):  18%|β–ˆβ–Š        | 152/840 [09:38<42:39,  3.72s/it]
Training 1/2 epoch (loss 0.4863):  18%|β–ˆβ–Š        | 152/840 [09:42<42:39,  3.72s/it]
Training 1/2 epoch (loss 0.4863):  18%|β–ˆβ–Š        | 153/840 [09:42<42:37,  3.72s/it]
Training 1/2 epoch (loss 0.5625):  18%|β–ˆβ–Š        | 153/840 [09:45<42:37,  3.72s/it]
Training 1/2 epoch (loss 0.5625):  18%|β–ˆβ–Š        | 154/840 [09:45<40:46,  3.57s/it]
Training 1/2 epoch (loss 0.7422):  18%|β–ˆβ–Š        | 154/840 [09:50<40:46,  3.57s/it]
Training 1/2 epoch (loss 0.7422):  18%|β–ˆβ–Š        | 155/840 [09:50<44:16,  3.88s/it]
Training 1/2 epoch (loss 0.5703):  18%|β–ˆβ–Š        | 155/840 [09:54<44:16,  3.88s/it]
Training 1/2 epoch (loss 0.5703):  19%|β–ˆβ–Š        | 156/840 [09:54<43:53,  3.85s/it]
Training 1/2 epoch (loss 0.5625):  19%|β–ˆβ–Š        | 156/840 [09:57<43:53,  3.85s/it]
Training 1/2 epoch (loss 0.5625):  19%|β–ˆβ–Š        | 157/840 [09:57<41:39,  3.66s/it]
Training 1/2 epoch (loss 0.5742):  19%|β–ˆβ–Š        | 157/840 [10:00<41:39,  3.66s/it]
Training 1/2 epoch (loss 0.5742):  19%|β–ˆβ–‰        | 158/840 [10:00<38:51,  3.42s/it]
Training 1/2 epoch (loss 0.5195):  19%|β–ˆβ–‰        | 158/840 [10:03<38:51,  3.42s/it]
Training 1/2 epoch (loss 0.5195):  19%|β–ˆβ–‰        | 159/840 [10:03<39:53,  3.52s/it]
Training 1/2 epoch (loss 0.6211):  19%|β–ˆβ–‰        | 159/840 [10:08<39:53,  3.52s/it]
Training 1/2 epoch (loss 0.6211):  19%|β–ˆβ–‰        | 160/840 [10:08<42:35,  3.76s/it]
Training 1/2 epoch (loss 0.5781):  19%|β–ˆβ–‰        | 160/840 [10:12<42:35,  3.76s/it]
Training 1/2 epoch (loss 0.5781):  19%|β–ˆβ–‰        | 161/840 [10:12<44:55,  3.97s/it]
Training 1/2 epoch (loss 0.6797):  19%|β–ˆβ–‰        | 161/840 [10:18<44:55,  3.97s/it]
Training 1/2 epoch (loss 0.6797):  19%|β–ˆβ–‰        | 162/840 [10:18<50:23,  4.46s/it]
Training 1/2 epoch (loss 0.6367):  19%|β–ˆβ–‰        | 162/840 [10:22<50:23,  4.46s/it]
Training 1/2 epoch (loss 0.6367):  19%|β–ˆβ–‰        | 163/840 [10:22<49:29,  4.39s/it]
Training 1/2 epoch (loss 0.5469):  19%|β–ˆβ–‰        | 163/840 [10:25<49:29,  4.39s/it]
Training 1/2 epoch (loss 0.5469):  20%|β–ˆβ–‰        | 164/840 [10:25<45:14,  4.02s/it]
Training 1/2 epoch (loss 0.5781):  20%|β–ˆβ–‰        | 164/840 [10:29<45:14,  4.02s/it]
Training 1/2 epoch (loss 0.5781):  20%|β–ˆβ–‰        | 165/840 [10:29<44:03,  3.92s/it]
Training 1/2 epoch (loss 0.5859):  20%|β–ˆβ–‰        | 165/840 [10:32<44:03,  3.92s/it]
Training 1/2 epoch (loss 0.5859):  20%|β–ˆβ–‰        | 166/840 [10:32<42:16,  3.76s/it]
Training 1/2 epoch (loss 0.5000):  20%|β–ˆβ–‰        | 166/840 [10:35<42:16,  3.76s/it]
Training 1/2 epoch (loss 0.5000):  20%|β–ˆβ–‰        | 167/840 [10:35<39:59,  3.56s/it]
Training 1/2 epoch (loss 0.5195):  20%|β–ˆβ–‰        | 167/840 [10:39<39:59,  3.56s/it]
Training 1/2 epoch (loss 0.5195):  20%|β–ˆβ–ˆ        | 168/840 [10:39<39:59,  3.57s/it]
Training 1/2 epoch (loss 0.5742):  20%|β–ˆβ–ˆ        | 168/840 [10:42<39:59,  3.57s/it]
Training 1/2 epoch (loss 0.5742):  20%|β–ˆβ–ˆ        | 169/840 [10:42<39:15,  3.51s/it]
Training 1/2 epoch (loss 0.5781):  20%|β–ˆβ–ˆ        | 169/840 [10:46<39:15,  3.51s/it]
Training 1/2 epoch (loss 0.5781):  20%|β–ˆβ–ˆ        | 170/840 [10:46<39:32,  3.54s/it]
Training 1/2 epoch (loss 0.5820):  20%|β–ˆβ–ˆ        | 170/840 [10:50<39:32,  3.54s/it]
Training 1/2 epoch (loss 0.5820):  20%|β–ˆβ–ˆ        | 171/840 [10:50<41:20,  3.71s/it]
Training 1/2 epoch (loss 0.5430):  20%|β–ˆβ–ˆ        | 171/840 [10:54<41:20,  3.71s/it]
Training 1/2 epoch (loss 0.5430):  20%|β–ˆβ–ˆ        | 172/840 [10:54<40:56,  3.68s/it]
Training 1/2 epoch (loss 0.5469):  20%|β–ˆβ–ˆ        | 172/840 [10:57<40:56,  3.68s/it]
Training 1/2 epoch (loss 0.5469):  21%|β–ˆβ–ˆ        | 173/840 [10:57<41:14,  3.71s/it]
Training 1/2 epoch (loss 0.5938):  21%|β–ˆβ–ˆ        | 173/840 [11:00<41:14,  3.71s/it]
Training 1/2 epoch (loss 0.5938):  21%|β–ˆβ–ˆ        | 174/840 [11:00<38:10,  3.44s/it]
Training 1/2 epoch (loss 0.5938):  21%|β–ˆβ–ˆ        | 174/840 [11:03<38:10,  3.44s/it]
Training 1/2 epoch (loss 0.5938):  21%|β–ˆβ–ˆ        | 175/840 [11:03<35:42,  3.22s/it]
Training 1/2 epoch (loss 0.5547):  21%|β–ˆβ–ˆ        | 175/840 [11:06<35:42,  3.22s/it]
Training 1/2 epoch (loss 0.5547):  21%|β–ˆβ–ˆ        | 176/840 [11:06<35:27,  3.20s/it]
Training 1/2 epoch (loss 0.5938):  21%|β–ˆβ–ˆ        | 176/840 [11:12<35:27,  3.20s/it]
Training 1/2 epoch (loss 0.5938):  21%|β–ˆβ–ˆ        | 177/840 [11:12<42:50,  3.88s/it]
Training 1/2 epoch (loss 0.6094):  21%|β–ˆβ–ˆ        | 177/840 [11:15<42:50,  3.88s/it]
Training 1/2 epoch (loss 0.6094):  21%|β–ˆβ–ˆ        | 178/840 [11:15<42:03,  3.81s/it]
Training 1/2 epoch (loss 0.5938):  21%|β–ˆβ–ˆ        | 178/840 [11:20<42:03,  3.81s/it]
Training 1/2 epoch (loss 0.5938):  21%|β–ˆβ–ˆβ–       | 179/840 [11:20<43:27,  3.94s/it]
Training 1/2 epoch (loss 0.5547):  21%|β–ˆβ–ˆβ–       | 179/840 [11:22<43:27,  3.94s/it]
Training 1/2 epoch (loss 0.5547):  21%|β–ˆβ–ˆβ–       | 180/840 [11:22<38:58,  3.54s/it]
Training 1/2 epoch (loss 0.4941):  21%|β–ˆβ–ˆβ–       | 180/840 [11:25<38:58,  3.54s/it]
Training 1/2 epoch (loss 0.4941):  22%|β–ˆβ–ˆβ–       | 181/840 [11:25<37:41,  3.43s/it]
Training 1/2 epoch (loss 0.5391):  22%|β–ˆβ–ˆβ–       | 181/840 [11:28<37:41,  3.43s/it]
Training 1/2 epoch (loss 0.5391):  22%|β–ˆβ–ˆβ–       | 182/840 [11:28<35:17,  3.22s/it]
Training 1/2 epoch (loss 0.5586):  22%|β–ˆβ–ˆβ–       | 182/840 [11:31<35:17,  3.22s/it]
Training 1/2 epoch (loss 0.5586):  22%|β–ˆβ–ˆβ–       | 183/840 [11:31<34:56,  3.19s/it]
Training 1/2 epoch (loss 0.5664):  22%|β–ˆβ–ˆβ–       | 183/840 [11:34<34:56,  3.19s/it]
Training 1/2 epoch (loss 0.5664):  22%|β–ˆβ–ˆβ–       | 184/840 [11:34<34:53,  3.19s/it]
Training 1/2 epoch (loss 0.6250):  22%|β–ˆβ–ˆβ–       | 184/840 [11:38<34:53,  3.19s/it]
Training 1/2 epoch (loss 0.6250):  22%|β–ˆβ–ˆβ–       | 185/840 [11:38<35:53,  3.29s/it]
Training 1/2 epoch (loss 0.5625):  22%|β–ˆβ–ˆβ–       | 185/840 [11:42<35:53,  3.29s/it]
Training 1/2 epoch (loss 0.5625):  22%|β–ˆβ–ˆβ–       | 186/840 [11:42<39:21,  3.61s/it]
Training 1/2 epoch (loss 0.6328):  22%|β–ˆβ–ˆβ–       | 186/840 [11:46<39:21,  3.61s/it]
Training 1/2 epoch (loss 0.6328):  22%|β–ˆβ–ˆβ–       | 187/840 [11:46<38:48,  3.57s/it]
Training 1/2 epoch (loss 0.6055):  22%|β–ˆβ–ˆβ–       | 187/840 [11:50<38:48,  3.57s/it]
Training 1/2 epoch (loss 0.6055):  22%|β–ˆβ–ˆβ–       | 188/840 [11:50<41:44,  3.84s/it]
Training 1/2 epoch (loss 0.6289):  22%|β–ˆβ–ˆβ–       | 188/840 [11:53<41:44,  3.84s/it]
Training 1/2 epoch (loss 0.6289):  22%|β–ˆβ–ˆβ–Ž       | 189/840 [11:53<39:10,  3.61s/it]
Training 1/2 epoch (loss 0.5625):  22%|β–ˆβ–ˆβ–Ž       | 189/840 [11:57<39:10,  3.61s/it]
Training 1/2 epoch (loss 0.5625):  23%|β–ˆβ–ˆβ–Ž       | 190/840 [11:57<38:05,  3.52s/it]
Training 1/2 epoch (loss 0.4395):  23%|β–ˆβ–ˆβ–Ž       | 190/840 [12:00<38:05,  3.52s/it]
Training 1/2 epoch (loss 0.4395):  23%|β–ˆβ–ˆβ–Ž       | 191/840 [12:00<39:07,  3.62s/it]
Training 1/2 epoch (loss 0.6641):  23%|β–ˆβ–ˆβ–Ž       | 191/840 [12:04<39:07,  3.62s/it]
Training 1/2 epoch (loss 0.6641):  23%|β–ˆβ–ˆβ–Ž       | 192/840 [12:04<38:55,  3.60s/it]
Training 1/2 epoch (loss 0.5312):  23%|β–ˆβ–ˆβ–Ž       | 192/840 [12:07<38:55,  3.60s/it]
Training 1/2 epoch (loss 0.5312):  23%|β–ˆβ–ˆβ–Ž       | 193/840 [12:07<37:09,  3.45s/it]
Training 1/2 epoch (loss 0.5195):  23%|β–ˆβ–ˆβ–Ž       | 193/840 [12:11<37:09,  3.45s/it]
Training 1/2 epoch (loss 0.5195):  23%|β–ˆβ–ˆβ–Ž       | 194/840 [12:11<38:19,  3.56s/it]
Training 1/2 epoch (loss 0.5352):  23%|β–ˆβ–ˆβ–Ž       | 194/840 [12:14<38:19,  3.56s/it]
Training 1/2 epoch (loss 0.5352):  23%|β–ˆβ–ˆβ–Ž       | 195/840 [12:14<35:59,  3.35s/it]
Training 1/2 epoch (loss 0.5859):  23%|β–ˆβ–ˆβ–Ž       | 195/840 [12:17<35:59,  3.35s/it]
Training 1/2 epoch (loss 0.5859):  23%|β–ˆβ–ˆβ–Ž       | 196/840 [12:17<36:03,  3.36s/it]
Training 1/2 epoch (loss 0.5625):  23%|β–ˆβ–ˆβ–Ž       | 196/840 [12:23<36:03,  3.36s/it]
Training 1/2 epoch (loss 0.5625):  23%|β–ˆβ–ˆβ–Ž       | 197/840 [12:23<42:51,  4.00s/it]
Training 1/2 epoch (loss 0.5547):  23%|β–ˆβ–ˆβ–Ž       | 197/840 [12:26<42:51,  4.00s/it]
Training 1/2 epoch (loss 0.5547):  24%|β–ˆβ–ˆβ–Ž       | 198/840 [12:26<42:06,  3.94s/it]
Training 1/2 epoch (loss 0.5859):  24%|β–ˆβ–ˆβ–Ž       | 198/840 [12:29<42:06,  3.94s/it]
Training 1/2 epoch (loss 0.5859):  24%|β–ˆβ–ˆβ–Ž       | 199/840 [12:29<39:15,  3.67s/it]
Training 1/2 epoch (loss 0.5547):  24%|β–ˆβ–ˆβ–Ž       | 199/840 [12:33<39:15,  3.67s/it]
Training 1/2 epoch (loss 0.5547):  24%|β–ˆβ–ˆβ–       | 200/840 [12:33<38:27,  3.61s/it]
Training 1/2 epoch (loss 0.6406):  24%|β–ˆβ–ˆβ–       | 200/840 [12:37<38:27,  3.61s/it]
Training 1/2 epoch (loss 0.6406):  24%|β–ˆβ–ˆβ–       | 201/840 [12:37<38:38,  3.63s/it]
Training 1/2 epoch (loss 0.5586):  24%|β–ˆβ–ˆβ–       | 201/840 [12:40<38:38,  3.63s/it]
Training 1/2 epoch (loss 0.5586):  24%|β–ˆβ–ˆβ–       | 202/840 [12:40<38:21,  3.61s/it]
Training 1/2 epoch (loss 0.5352):  24%|β–ˆβ–ˆβ–       | 202/840 [12:43<38:21,  3.61s/it]
Training 1/2 epoch (loss 0.5352):  24%|β–ˆβ–ˆβ–       | 203/840 [12:43<35:06,  3.31s/it]
Training 1/2 epoch (loss 0.6211):  24%|β–ˆβ–ˆβ–       | 203/840 [12:47<35:06,  3.31s/it]
Training 1/2 epoch (loss 0.6211):  24%|β–ˆβ–ˆβ–       | 204/840 [12:47<36:36,  3.45s/it]
Training 1/2 epoch (loss 0.5469):  24%|β–ˆβ–ˆβ–       | 204/840 [12:51<36:36,  3.45s/it]
Training 1/2 epoch (loss 0.5469):  24%|β–ˆβ–ˆβ–       | 205/840 [12:51<41:09,  3.89s/it]
Training 1/2 epoch (loss 0.5469):  24%|β–ˆβ–ˆβ–       | 205/840 [12:55<41:09,  3.89s/it]
Training 1/2 epoch (loss 0.5469):  25%|β–ˆβ–ˆβ–       | 206/840 [12:55<39:41,  3.76s/it]
Training 1/2 epoch (loss 0.5664):  25%|β–ˆβ–ˆβ–       | 206/840 [12:58<39:41,  3.76s/it]
Training 1/2 epoch (loss 0.5664):  25%|β–ˆβ–ˆβ–       | 207/840 [12:58<38:05,  3.61s/it]
Training 1/2 epoch (loss 0.5234):  25%|β–ˆβ–ˆβ–       | 207/840 [13:03<38:05,  3.61s/it]
Training 1/2 epoch (loss 0.5234):  25%|β–ˆβ–ˆβ–       | 208/840 [13:03<41:36,  3.95s/it]
Training 1/2 epoch (loss 0.4707):  25%|β–ˆβ–ˆβ–       | 208/840 [13:06<41:36,  3.95s/it]
Training 1/2 epoch (loss 0.4707):  25%|β–ˆβ–ˆβ–       | 209/840 [13:06<37:46,  3.59s/it]
Training 1/2 epoch (loss 0.5781):  25%|β–ˆβ–ˆβ–       | 209/840 [13:09<37:46,  3.59s/it]
Training 1/2 epoch (loss 0.5781):  25%|β–ˆβ–ˆβ–Œ       | 210/840 [13:09<36:41,  3.49s/it]
Training 1/2 epoch (loss 0.6641):  25%|β–ˆβ–ˆβ–Œ       | 210/840 [13:14<36:41,  3.49s/it]
Training 1/2 epoch (loss 0.6641):  25%|β–ˆβ–ˆβ–Œ       | 211/840 [13:14<42:53,  4.09s/it]
Training 1/2 epoch (loss 0.5586):  25%|β–ˆβ–ˆβ–Œ       | 211/840 [13:17<42:53,  4.09s/it]
Training 1/2 epoch (loss 0.5586):  25%|β–ˆβ–ˆβ–Œ       | 212/840 [13:17<39:21,  3.76s/it]
Training 1/2 epoch (loss 0.4805):  25%|β–ˆβ–ˆβ–Œ       | 212/840 [13:21<39:21,  3.76s/it]
Training 1/2 epoch (loss 0.4805):  25%|β–ˆβ–ˆβ–Œ       | 213/840 [13:21<37:16,  3.57s/it]
Training 1/2 epoch (loss 0.6680):  25%|β–ˆβ–ˆβ–Œ       | 213/840 [13:24<37:16,  3.57s/it]
Training 1/2 epoch (loss 0.6680):  25%|β–ˆβ–ˆβ–Œ       | 214/840 [13:24<37:07,  3.56s/it]
Training 1/2 epoch (loss 0.5117):  25%|β–ˆβ–ˆβ–Œ       | 214/840 [13:30<37:07,  3.56s/it]
Training 1/2 epoch (loss 0.5117):  26%|β–ˆβ–ˆβ–Œ       | 215/840 [13:30<43:01,  4.13s/it]
Training 1/2 epoch (loss 0.5938):  26%|β–ˆβ–ˆβ–Œ       | 215/840 [13:34<43:01,  4.13s/it]
Training 1/2 epoch (loss 0.5938):  26%|β–ˆβ–ˆβ–Œ       | 216/840 [13:34<42:56,  4.13s/it]
Training 1/2 epoch (loss 0.5234):  26%|β–ˆβ–ˆβ–Œ       | 216/840 [13:37<42:56,  4.13s/it]
Training 1/2 epoch (loss 0.5234):  26%|β–ˆβ–ˆβ–Œ       | 217/840 [13:37<42:01,  4.05s/it]
Training 1/2 epoch (loss 0.4980):  26%|β–ˆβ–ˆβ–Œ       | 217/840 [13:42<42:01,  4.05s/it]
Training 1/2 epoch (loss 0.4980):  26%|β–ˆβ–ˆβ–Œ       | 218/840 [13:42<44:51,  4.33s/it]
Training 1/2 epoch (loss 0.6172):  26%|β–ˆβ–ˆβ–Œ       | 218/840 [13:46<44:51,  4.33s/it]
Training 1/2 epoch (loss 0.6172):  26%|β–ˆβ–ˆβ–Œ       | 219/840 [13:46<42:43,  4.13s/it]
Training 1/2 epoch (loss 0.4844):  26%|β–ˆβ–ˆβ–Œ       | 219/840 [13:50<42:43,  4.13s/it]
Training 1/2 epoch (loss 0.4844):  26%|β–ˆβ–ˆβ–Œ       | 220/840 [13:50<43:17,  4.19s/it]
Training 1/2 epoch (loss 0.5000):  26%|β–ˆβ–ˆβ–Œ       | 220/840 [13:53<43:17,  4.19s/it]
Training 1/2 epoch (loss 0.5000):  26%|β–ˆβ–ˆβ–‹       | 221/840 [13:53<38:47,  3.76s/it]
Training 1/2 epoch (loss 0.5000):  26%|β–ˆβ–ˆβ–‹       | 221/840 [13:58<38:47,  3.76s/it]
Training 1/2 epoch (loss 0.5000):  26%|β–ˆβ–ˆβ–‹       | 222/840 [13:58<42:18,  4.11s/it]
Training 1/2 epoch (loss 0.6484):  26%|β–ˆβ–ˆβ–‹       | 222/840 [14:01<42:18,  4.11s/it]
Training 1/2 epoch (loss 0.6484):  27%|β–ˆβ–ˆβ–‹       | 223/840 [14:01<39:47,  3.87s/it]
Training 1/2 epoch (loss 0.5547):  27%|β–ˆβ–ˆβ–‹       | 223/840 [14:04<39:47,  3.87s/it]
Training 1/2 epoch (loss 0.5547):  27%|β–ˆβ–ˆβ–‹       | 224/840 [14:04<35:40,  3.48s/it]
Training 1/2 epoch (loss 0.5312):  27%|β–ˆβ–ˆβ–‹       | 224/840 [14:08<35:40,  3.48s/it]
Training 1/2 epoch (loss 0.5312):  27%|β–ˆβ–ˆβ–‹       | 225/840 [14:08<38:42,  3.78s/it]
Training 1/2 epoch (loss 0.5078):  27%|β–ˆβ–ˆβ–‹       | 225/840 [14:12<38:42,  3.78s/it]
Training 1/2 epoch (loss 0.5078):  27%|β–ˆβ–ˆβ–‹       | 226/840 [14:12<38:07,  3.73s/it]
Training 1/2 epoch (loss 0.5078):  27%|β–ˆβ–ˆβ–‹       | 226/840 [14:15<38:07,  3.73s/it]
Training 1/2 epoch (loss 0.5078):  27%|β–ˆβ–ˆβ–‹       | 227/840 [14:15<36:02,  3.53s/it]
Training 1/2 epoch (loss 0.6602):  27%|β–ˆβ–ˆβ–‹       | 227/840 [14:21<36:02,  3.53s/it]
Training 1/2 epoch (loss 0.6602):  27%|β–ˆβ–ˆβ–‹       | 228/840 [14:21<42:18,  4.15s/it]
Training 1/2 epoch (loss 0.5156):  27%|β–ˆβ–ˆβ–‹       | 228/840 [14:24<42:18,  4.15s/it]
Training 1/2 epoch (loss 0.5156):  27%|β–ˆβ–ˆβ–‹       | 229/840 [14:24<39:52,  3.92s/it]
Training 1/2 epoch (loss 0.5938):  27%|β–ˆβ–ˆβ–‹       | 229/840 [14:29<39:52,  3.92s/it]
Training 1/2 epoch (loss 0.5938):  27%|β–ˆβ–ˆβ–‹       | 230/840 [14:29<41:38,  4.10s/it]
Training 1/2 epoch (loss 0.5469):  27%|β–ˆβ–ˆβ–‹       | 230/840 [14:32<41:38,  4.10s/it]
Training 1/2 epoch (loss 0.5469):  28%|β–ˆβ–ˆβ–Š       | 231/840 [14:32<37:52,  3.73s/it]
Training 1/2 epoch (loss 0.5586):  28%|β–ˆβ–ˆβ–Š       | 231/840 [14:35<37:52,  3.73s/it]
Training 1/2 epoch (loss 0.5586):  28%|β–ˆβ–ˆβ–Š       | 232/840 [14:35<36:11,  3.57s/it]
Training 1/2 epoch (loss 0.5898):  28%|β–ˆβ–ˆβ–Š       | 232/840 [14:39<36:11,  3.57s/it]
Training 1/2 epoch (loss 0.5898):  28%|β–ˆβ–ˆβ–Š       | 233/840 [14:39<39:18,  3.89s/it]
Training 1/2 epoch (loss 0.5742):  28%|β–ˆβ–ˆβ–Š       | 233/840 [14:42<39:18,  3.89s/it]
Training 1/2 epoch (loss 0.5742):  28%|β–ˆβ–ˆβ–Š       | 234/840 [14:42<36:07,  3.58s/it]
Training 1/2 epoch (loss 0.6562):  28%|β–ˆβ–ˆβ–Š       | 234/840 [14:45<36:07,  3.58s/it]
Training 1/2 epoch (loss 0.6562):  28%|β–ˆβ–ˆβ–Š       | 235/840 [14:45<33:48,  3.35s/it]
Training 1/2 epoch (loss 0.5938):  28%|β–ˆβ–ˆβ–Š       | 235/840 [14:51<33:48,  3.35s/it]
Training 1/2 epoch (loss 0.5938):  28%|β–ˆβ–ˆβ–Š       | 236/840 [14:51<40:30,  4.02s/it]
Training 1/2 epoch (loss 0.4961):  28%|β–ˆβ–ˆβ–Š       | 236/840 [14:55<40:30,  4.02s/it]
Training 1/2 epoch (loss 0.4961):  28%|β–ˆβ–ˆβ–Š       | 237/840 [14:55<41:53,  4.17s/it]
Training 1/2 epoch (loss 0.6016):  28%|β–ˆβ–ˆβ–Š       | 237/840 [14:58<41:53,  4.17s/it]
Training 1/2 epoch (loss 0.6016):  28%|β–ˆβ–ˆβ–Š       | 238/840 [14:58<38:56,  3.88s/it]
Training 1/2 epoch (loss 0.6172):  28%|β–ˆβ–ˆβ–Š       | 238/840 [15:04<38:56,  3.88s/it]
Training 1/2 epoch (loss 0.6172):  28%|β–ˆβ–ˆβ–Š       | 239/840 [15:04<43:34,  4.35s/it]
Training 1/2 epoch (loss 0.5039):  28%|β–ˆβ–ˆβ–Š       | 239/840 [15:08<43:34,  4.35s/it]
Training 1/2 epoch (loss 0.5039):  29%|β–ˆβ–ˆβ–Š       | 240/840 [15:08<41:52,  4.19s/it]
Training 1/2 epoch (loss 0.5078):  29%|β–ˆβ–ˆβ–Š       | 240/840 [15:12<41:52,  4.19s/it]
Training 1/2 epoch (loss 0.5078):  29%|β–ˆβ–ˆβ–Š       | 241/840 [15:12<42:46,  4.28s/it]
Training 1/2 epoch (loss 0.5078):  29%|β–ˆβ–ˆβ–Š       | 241/840 [15:16<42:46,  4.28s/it]
Training 1/2 epoch (loss 0.5078):  29%|β–ˆβ–ˆβ–‰       | 242/840 [15:16<41:55,  4.21s/it]
Training 1/2 epoch (loss 0.5859):  29%|β–ˆβ–ˆβ–‰       | 242/840 [15:20<41:55,  4.21s/it]
Training 1/2 epoch (loss 0.5859):  29%|β–ˆβ–ˆβ–‰       | 243/840 [15:20<39:42,  3.99s/it]
Training 1/2 epoch (loss 0.5430):  29%|β–ˆβ–ˆβ–‰       | 243/840 [15:25<39:42,  3.99s/it]
Training 1/2 epoch (loss 0.5430):  29%|β–ˆβ–ˆβ–‰       | 244/840 [15:25<44:13,  4.45s/it]
Training 1/2 epoch (loss 0.5430):  29%|β–ˆβ–ˆβ–‰       | 244/840 [15:28<44:13,  4.45s/it]
Training 1/2 epoch (loss 0.5430):  29%|β–ˆβ–ˆβ–‰       | 245/840 [15:28<40:21,  4.07s/it]
Training 1/2 epoch (loss 0.6172):  29%|β–ˆβ–ˆβ–‰       | 245/840 [15:31<40:21,  4.07s/it]
Training 1/2 epoch (loss 0.6172):  29%|β–ˆβ–ˆβ–‰       | 246/840 [15:31<37:37,  3.80s/it]
Training 1/2 epoch (loss 0.5625):  29%|β–ˆβ–ˆβ–‰       | 246/840 [15:34<37:37,  3.80s/it]
Training 1/2 epoch (loss 0.5625):  29%|β–ˆβ–ˆβ–‰       | 247/840 [15:34<33:59,  3.44s/it]
Training 1/2 epoch (loss 0.5469):  29%|β–ˆβ–ˆβ–‰       | 247/840 [15:40<33:59,  3.44s/it]
Training 1/2 epoch (loss 0.5469):  30%|β–ˆβ–ˆβ–‰       | 248/840 [15:40<40:13,  4.08s/it]
Training 1/2 epoch (loss 0.5664):  30%|β–ˆβ–ˆβ–‰       | 248/840 [15:43<40:13,  4.08s/it]
Training 1/2 epoch (loss 0.5664):  30%|β–ˆβ–ˆβ–‰       | 249/840 [15:43<37:10,  3.77s/it]
Training 1/2 epoch (loss 0.5430):  30%|β–ˆβ–ˆβ–‰       | 249/840 [15:46<37:10,  3.77s/it]
Training 1/2 epoch (loss 0.5430):  30%|β–ˆβ–ˆβ–‰       | 250/840 [15:46<34:46,  3.54s/it]
Training 1/2 epoch (loss 0.5156):  30%|β–ˆβ–ˆβ–‰       | 250/840 [15:49<34:46,  3.54s/it]
Training 1/2 epoch (loss 0.5156):  30%|β–ˆβ–ˆβ–‰       | 251/840 [15:49<35:15,  3.59s/it]
Training 1/2 epoch (loss 0.5195):  30%|β–ˆβ–ˆβ–‰       | 251/840 [15:53<35:15,  3.59s/it]
Training 1/2 epoch (loss 0.5195):  30%|β–ˆβ–ˆβ–ˆ       | 252/840 [15:53<34:10,  3.49s/it]
Training 1/2 epoch (loss 0.5352):  30%|β–ˆβ–ˆβ–ˆ       | 252/840 [15:56<34:10,  3.49s/it]
Training 1/2 epoch (loss 0.5352):  30%|β–ˆβ–ˆβ–ˆ       | 253/840 [15:56<34:34,  3.53s/it]
Training 1/2 epoch (loss 0.5625):  30%|β–ˆβ–ˆβ–ˆ       | 253/840 [16:00<34:34,  3.53s/it]
Training 1/2 epoch (loss 0.5625):  30%|β–ˆβ–ˆβ–ˆ       | 254/840 [16:00<35:41,  3.65s/it]
Training 1/2 epoch (loss 0.5391):  30%|β–ˆβ–ˆβ–ˆ       | 254/840 [16:04<35:41,  3.65s/it]
Training 1/2 epoch (loss 0.5391):  30%|β–ˆβ–ˆβ–ˆ       | 255/840 [16:04<34:42,  3.56s/it]
Training 1/2 epoch (loss 0.5391):  30%|β–ˆβ–ˆβ–ˆ       | 255/840 [16:07<34:42,  3.56s/it]
Training 1/2 epoch (loss 0.5391):  30%|β–ˆβ–ˆβ–ˆ       | 256/840 [16:07<35:35,  3.66s/it]
Training 1/2 epoch (loss 0.5547):  30%|β–ˆβ–ˆβ–ˆ       | 256/840 [16:12<35:35,  3.66s/it]
Training 1/2 epoch (loss 0.5547):  31%|β–ˆβ–ˆβ–ˆ       | 257/840 [16:12<38:50,  4.00s/it]
Training 1/2 epoch (loss 0.4883):  31%|β–ˆβ–ˆβ–ˆ       | 257/840 [16:16<38:50,  4.00s/it]
Training 1/2 epoch (loss 0.4883):  31%|β–ˆβ–ˆβ–ˆ       | 258/840 [16:16<36:53,  3.80s/it]
Training 1/2 epoch (loss 0.6133):  31%|β–ˆβ–ˆβ–ˆ       | 258/840 [16:19<36:53,  3.80s/it]
Training 1/2 epoch (loss 0.6133):  31%|β–ˆβ–ˆβ–ˆ       | 259/840 [16:19<34:15,  3.54s/it]
Training 1/2 epoch (loss 0.6641):  31%|β–ˆβ–ˆβ–ˆ       | 259/840 [16:22<34:15,  3.54s/it]
Training 1/2 epoch (loss 0.6641):  31%|β–ˆβ–ˆβ–ˆ       | 260/840 [16:22<34:41,  3.59s/it]
Training 1/2 epoch (loss 0.4961):  31%|β–ˆβ–ˆβ–ˆ       | 260/840 [16:25<34:41,  3.59s/it]
Training 1/2 epoch (loss 0.4961):  31%|β–ˆβ–ˆβ–ˆ       | 261/840 [16:25<31:51,  3.30s/it]
Training 1/2 epoch (loss 0.5273):  31%|β–ˆβ–ˆβ–ˆ       | 261/840 [16:28<31:51,  3.30s/it]
Training 1/2 epoch (loss 0.5273):  31%|β–ˆβ–ˆβ–ˆ       | 262/840 [16:28<32:32,  3.38s/it]
Training 1/2 epoch (loss 0.5430):  31%|β–ˆβ–ˆβ–ˆ       | 262/840 [16:31<32:32,  3.38s/it]
Training 1/2 epoch (loss 0.5430):  31%|β–ˆβ–ˆβ–ˆβ–      | 263/840 [16:31<30:28,  3.17s/it]
Training 1/2 epoch (loss 0.5469):  31%|β–ˆβ–ˆβ–ˆβ–      | 263/840 [16:34<30:28,  3.17s/it]
Training 1/2 epoch (loss 0.5469):  31%|β–ˆβ–ˆβ–ˆβ–      | 264/840 [16:34<28:40,  2.99s/it]
Training 1/2 epoch (loss 0.4570):  31%|β–ˆβ–ˆβ–ˆβ–      | 264/840 [16:37<28:40,  2.99s/it]
Training 1/2 epoch (loss 0.4570):  32%|β–ˆβ–ˆβ–ˆβ–      | 265/840 [16:37<28:23,  2.96s/it]
Training 1/2 epoch (loss 0.8672):  32%|β–ˆβ–ˆβ–ˆβ–      | 265/840 [16:42<28:23,  2.96s/it]
Training 1/2 epoch (loss 0.8672):  32%|β–ˆβ–ˆβ–ˆβ–      | 266/840 [16:42<35:43,  3.73s/it]
Training 1/2 epoch (loss 0.6367):  32%|β–ˆβ–ˆβ–ˆβ–      | 266/840 [16:45<35:43,  3.73s/it]
Training 1/2 epoch (loss 0.6367):  32%|β–ˆβ–ˆβ–ˆβ–      | 267/840 [16:45<33:34,  3.52s/it]
Training 1/2 epoch (loss 0.4766):  32%|β–ˆβ–ˆβ–ˆβ–      | 267/840 [16:48<33:34,  3.52s/it]
Training 1/2 epoch (loss 0.4766):  32%|β–ˆβ–ˆβ–ˆβ–      | 268/840 [16:48<32:47,  3.44s/it]
Training 1/2 epoch (loss 0.4961):  32%|β–ˆβ–ˆβ–ˆβ–      | 268/840 [16:51<32:47,  3.44s/it]
Training 1/2 epoch (loss 0.4961):  32%|β–ˆβ–ˆβ–ˆβ–      | 269/840 [16:51<30:15,  3.18s/it]
Training 1/2 epoch (loss 0.5000):  32%|β–ˆβ–ˆβ–ˆβ–      | 269/840 [16:54<30:15,  3.18s/it]
Training 1/2 epoch (loss 0.5000):  32%|β–ˆβ–ˆβ–ˆβ–      | 270/840 [16:54<30:43,  3.23s/it]
Training 1/2 epoch (loss 0.5352):  32%|β–ˆβ–ˆβ–ˆβ–      | 270/840 [16:59<30:43,  3.23s/it]
Training 1/2 epoch (loss 0.5352):  32%|β–ˆβ–ˆβ–ˆβ–      | 271/840 [16:59<35:10,  3.71s/it]
Training 1/2 epoch (loss 0.5156):  32%|β–ˆβ–ˆβ–ˆβ–      | 271/840 [17:02<35:10,  3.71s/it]
Training 1/2 epoch (loss 0.5156):  32%|β–ˆβ–ˆβ–ˆβ–      | 272/840 [17:02<33:09,  3.50s/it]
Training 1/2 epoch (loss 0.5000):  32%|β–ˆβ–ˆβ–ˆβ–      | 272/840 [17:06<33:09,  3.50s/it]
Training 1/2 epoch (loss 0.5000):  32%|β–ˆβ–ˆβ–ˆβ–Ž      | 273/840 [17:06<35:11,  3.72s/it]
Training 1/2 epoch (loss 0.5039):  32%|β–ˆβ–ˆβ–ˆβ–Ž      | 273/840 [17:11<35:11,  3.72s/it]
Training 1/2 epoch (loss 0.5039):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 274/840 [17:11<36:20,  3.85s/it]
Training 1/2 epoch (loss 0.5625):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 274/840 [17:15<36:20,  3.85s/it]
Training 1/2 epoch (loss 0.5625):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 275/840 [17:15<39:02,  4.15s/it]
Training 1/2 epoch (loss 0.5156):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 275/840 [17:20<39:02,  4.15s/it]
Training 1/2 epoch (loss 0.5156):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 276/840 [17:20<39:20,  4.18s/it]
Training 1/2 epoch (loss 0.5977):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 276/840 [17:23<39:20,  4.18s/it]
Training 1/2 epoch (loss 0.5977):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 277/840 [17:23<36:41,  3.91s/it]
Training 1/2 epoch (loss 0.4336):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 277/840 [17:27<36:41,  3.91s/it]
Training 1/2 epoch (loss 0.4336):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 278/840 [17:27<36:41,  3.92s/it]
Training 1/2 epoch (loss 0.5586):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 278/840 [17:30<36:41,  3.92s/it]
Training 1/2 epoch (loss 0.5586):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 279/840 [17:30<35:48,  3.83s/it]
Training 1/2 epoch (loss 0.5273):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 279/840 [17:33<35:48,  3.83s/it]
Training 1/2 epoch (loss 0.5273):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 280/840 [17:33<33:05,  3.55s/it]
Training 1/2 epoch (loss 0.5781):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 280/840 [17:37<33:05,  3.55s/it]
Training 1/2 epoch (loss 0.5781):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 281/840 [17:37<32:55,  3.53s/it]
Training 1/2 epoch (loss 0.6250):  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 281/840 [17:40<32:55,  3.53s/it]
Training 1/2 epoch (loss 0.6250):  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 282/840 [17:40<31:44,  3.41s/it]
Training 1/2 epoch (loss 0.5547):  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 282/840 [17:44<31:44,  3.41s/it]
Training 1/2 epoch (loss 0.5547):  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 283/840 [17:44<33:14,  3.58s/it]
Training 1/2 epoch (loss 0.5781):  34%|β–ˆβ–ˆβ–ˆβ–Ž      | 283/840 [17:47<33:14,  3.58s/it]
Training 1/2 epoch (loss 0.5781):  34%|β–ˆβ–ˆβ–ˆβ–      | 284/840 [17:47<31:11,  3.37s/it]
Training 1/2 epoch (loss 0.5430):  34%|β–ˆβ–ˆβ–ˆβ–      | 284/840 [17:51<31:11,  3.37s/it]
Training 1/2 epoch (loss 0.5430):  34%|β–ˆβ–ˆβ–ˆβ–      | 285/840 [17:51<32:10,  3.48s/it]
Training 1/2 epoch (loss 0.5273):  34%|β–ˆβ–ˆβ–ˆβ–      | 285/840 [17:54<32:10,  3.48s/it]
Training 1/2 epoch (loss 0.5273):  34%|β–ˆβ–ˆβ–ˆβ–      | 286/840 [17:54<32:11,  3.49s/it]
Training 1/2 epoch (loss 0.4414):  34%|β–ˆβ–ˆβ–ˆβ–      | 286/840 [17:58<32:11,  3.49s/it]
Training 1/2 epoch (loss 0.4414):  34%|β–ˆβ–ˆβ–ˆβ–      | 287/840 [17:58<32:22,  3.51s/it]
Training 1/2 epoch (loss 0.5000):  34%|β–ˆβ–ˆβ–ˆβ–      | 287/840 [18:01<32:22,  3.51s/it]
Training 1/2 epoch (loss 0.5000):  34%|β–ˆβ–ˆβ–ˆβ–      | 288/840 [18:01<30:29,  3.31s/it]
Training 1/2 epoch (loss 0.5586):  34%|β–ˆβ–ˆβ–ˆβ–      | 288/840 [18:04<30:29,  3.31s/it]
Training 1/2 epoch (loss 0.5586):  34%|β–ˆβ–ˆβ–ˆβ–      | 289/840 [18:04<29:46,  3.24s/it]
Training 1/2 epoch (loss 0.5898):  34%|β–ˆβ–ˆβ–ˆβ–      | 289/840 [18:07<29:46,  3.24s/it]
Training 1/2 epoch (loss 0.5898):  35%|β–ˆβ–ˆβ–ˆβ–      | 290/840 [18:07<30:50,  3.36s/it]
Training 1/2 epoch (loss 0.5195):  35%|β–ˆβ–ˆβ–ˆβ–      | 290/840 [18:12<30:50,  3.36s/it]
Training 1/2 epoch (loss 0.5195):  35%|β–ˆβ–ˆβ–ˆβ–      | 291/840 [18:12<33:19,  3.64s/it]
Training 1/2 epoch (loss 0.4844):  35%|β–ˆβ–ˆβ–ˆβ–      | 291/840 [18:16<33:19,  3.64s/it]
Training 1/2 epoch (loss 0.4844):  35%|β–ˆβ–ˆβ–ˆβ–      | 292/840 [18:16<36:47,  4.03s/it]
Training 1/2 epoch (loss 0.6523):  35%|β–ˆβ–ˆβ–ˆβ–      | 292/840 [18:21<36:47,  4.03s/it]
Training 1/2 epoch (loss 0.6523):  35%|β–ˆβ–ˆβ–ˆβ–      | 293/840 [18:21<37:39,  4.13s/it]
Training 1/2 epoch (loss 0.5547):  35%|β–ˆβ–ˆβ–ˆβ–      | 293/840 [18:24<37:39,  4.13s/it]
Training 1/2 epoch (loss 0.5547):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 294/840 [18:24<34:43,  3.82s/it]
Training 1/2 epoch (loss 0.5625):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 294/840 [18:27<34:43,  3.82s/it]
Training 1/2 epoch (loss 0.5625):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 295/840 [18:27<31:44,  3.49s/it]
Training 1/2 epoch (loss 0.5117):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 295/840 [18:30<31:44,  3.49s/it]
Training 1/2 epoch (loss 0.5117):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 296/840 [18:30<31:47,  3.51s/it]
Training 1/2 epoch (loss 0.5625):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 296/840 [18:34<31:47,  3.51s/it]
Training 1/2 epoch (loss 0.5625):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 297/840 [18:34<32:10,  3.56s/it]
Training 1/2 epoch (loss 0.4434):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 297/840 [18:38<32:10,  3.56s/it]
Training 1/2 epoch (loss 0.4434):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 298/840 [18:38<32:44,  3.63s/it]
Training 1/2 epoch (loss 0.5547):  35%|β–ˆβ–ˆβ–ˆβ–Œ      | 298/840 [18:41<32:44,  3.63s/it]
Training 1/2 epoch (loss 0.5547):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 299/840 [18:41<32:18,  3.58s/it]
Training 1/2 epoch (loss 0.5938):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 299/840 [18:44<32:18,  3.58s/it]
Training 1/2 epoch (loss 0.5938):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 300/840 [18:44<30:37,  3.40s/it]
Training 1/2 epoch (loss 0.5742):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 300/840 [18:49<30:37,  3.40s/it]
Training 1/2 epoch (loss 0.5742):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 301/840 [18:49<34:22,  3.83s/it]
Training 1/2 epoch (loss 0.5820):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 301/840 [18:53<34:22,  3.83s/it]
Training 1/2 epoch (loss 0.5820):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 302/840 [18:53<34:19,  3.83s/it]
Training 1/2 epoch (loss 0.4883):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 302/840 [18:58<34:19,  3.83s/it]
Training 1/2 epoch (loss 0.4883):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 303/840 [18:58<38:48,  4.34s/it]
Training 1/2 epoch (loss 0.6367):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 303/840 [19:02<38:48,  4.34s/it]
Training 1/2 epoch (loss 0.6367):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 304/840 [19:02<36:01,  4.03s/it]
Training 1/2 epoch (loss 0.5156):  36%|β–ˆβ–ˆβ–ˆβ–Œ      | 304/840 [19:05<36:01,  4.03s/it]
Training 1/2 epoch (loss 0.5156):  36%|β–ˆβ–ˆβ–ˆβ–‹      | 305/840 [19:05<34:55,  3.92s/it]
Training 1/2 epoch (loss 0.6445):  36%|β–ˆβ–ˆβ–ˆβ–‹      | 305/840 [19:10<34:55,  3.92s/it]
Training 1/2 epoch (loss 0.6445):  36%|β–ˆβ–ˆβ–ˆβ–‹      | 306/840 [19:10<37:38,  4.23s/it]
Training 1/2 epoch (loss 0.4141):  36%|β–ˆβ–ˆβ–ˆβ–‹      | 306/840 [19:14<37:38,  4.23s/it]
Training 1/2 epoch (loss 0.4141):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 307/840 [19:14<35:52,  4.04s/it]
Training 1/2 epoch (loss 0.4805):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 307/840 [19:17<35:52,  4.04s/it]
Training 1/2 epoch (loss 0.4805):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 308/840 [19:17<33:45,  3.81s/it]
Training 1/2 epoch (loss 0.5547):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 308/840 [19:21<33:45,  3.81s/it]
Training 1/2 epoch (loss 0.5547):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 309/840 [19:21<33:24,  3.77s/it]
Training 1/2 epoch (loss 0.5039):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 309/840 [19:24<33:24,  3.77s/it]
Training 1/2 epoch (loss 0.5039):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 310/840 [19:24<31:46,  3.60s/it]
Training 1/2 epoch (loss 0.5312):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 310/840 [19:28<31:46,  3.60s/it]
Training 1/2 epoch (loss 0.5312):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 311/840 [19:28<31:44,  3.60s/it]
Training 1/2 epoch (loss 0.4941):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 311/840 [19:31<31:44,  3.60s/it]
Training 1/2 epoch (loss 0.4941):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 312/840 [19:31<31:02,  3.53s/it]
Training 1/2 epoch (loss 0.5547):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 312/840 [19:33<31:02,  3.53s/it]
Training 1/2 epoch (loss 0.5547):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 313/840 [19:33<28:15,  3.22s/it]
Training 1/2 epoch (loss 0.3809):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 313/840 [19:38<28:15,  3.22s/it]
Training 1/2 epoch (loss 0.3809):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 314/840 [19:38<30:37,  3.49s/it]
Training 1/2 epoch (loss 0.5078):  37%|β–ˆβ–ˆβ–ˆβ–‹      | 314/840 [19:42<30:37,  3.49s/it]
Training 1/2 epoch (loss 0.5078):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 315/840 [19:42<32:55,  3.76s/it]
Training 1/2 epoch (loss 0.5469):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 315/840 [19:45<32:55,  3.76s/it]
Training 1/2 epoch (loss 0.5469):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 316/840 [19:45<32:19,  3.70s/it]
Training 1/2 epoch (loss 0.5742):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 316/840 [19:50<32:19,  3.70s/it]
Training 1/2 epoch (loss 0.5742):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 317/840 [19:50<33:53,  3.89s/it]
Training 1/2 epoch (loss 0.6328):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 317/840 [19:53<33:53,  3.89s/it]
Training 1/2 epoch (loss 0.6328):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 318/840 [19:53<31:55,  3.67s/it]
Training 1/2 epoch (loss 0.4883):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 318/840 [19:58<31:55,  3.67s/it]
Training 1/2 epoch (loss 0.4883):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 319/840 [19:58<36:24,  4.19s/it]
Training 1/2 epoch (loss 0.5000):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 319/840 [20:01<36:24,  4.19s/it]
Training 1/2 epoch (loss 0.5000):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 320/840 [20:01<33:04,  3.82s/it]
Training 1/2 epoch (loss 0.5078):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 320/840 [20:07<33:04,  3.82s/it]
Training 1/2 epoch (loss 0.5078):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 321/840 [20:07<37:00,  4.28s/it]
Training 1/2 epoch (loss 0.4727):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 321/840 [20:11<37:00,  4.28s/it]
Training 1/2 epoch (loss 0.4727):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 322/840 [20:11<37:27,  4.34s/it]
Training 1/2 epoch (loss 0.5898):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 322/840 [20:16<37:27,  4.34s/it]
Training 1/2 epoch (loss 0.5898):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 323/840 [20:16<38:02,  4.41s/it]
Training 1/2 epoch (loss 0.5977):  38%|β–ˆβ–ˆβ–ˆβ–Š      | 323/840 [20:19<38:02,  4.41s/it]
Training 1/2 epoch (loss 0.5977):  39%|β–ˆβ–ˆβ–ˆβ–Š      | 324/840 [20:19<35:35,  4.14s/it]
Training 1/2 epoch (loss 0.5742):  39%|β–ˆβ–ˆβ–ˆβ–Š      | 324/840 [20:22<35:35,  4.14s/it]
Training 1/2 epoch (loss 0.5742):  39%|β–ˆβ–ˆβ–ˆβ–Š      | 325/840 [20:22<32:21,  3.77s/it]
Training 1/2 epoch (loss 0.5938):  39%|β–ˆβ–ˆβ–ˆβ–Š      | 325/840 [20:27<32:21,  3.77s/it]
Training 1/2 epoch (loss 0.5938):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 326/840 [20:27<34:49,  4.06s/it]
Training 1/2 epoch (loss 0.4590):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 326/840 [20:32<34:49,  4.06s/it]
Training 1/2 epoch (loss 0.4590):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 327/840 [20:32<38:13,  4.47s/it]
Training 1/2 epoch (loss 0.4297):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 327/840 [20:36<38:13,  4.47s/it]
Training 1/2 epoch (loss 0.4297):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 328/840 [20:36<36:26,  4.27s/it]
Training 1/2 epoch (loss 0.4004):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 328/840 [20:40<36:26,  4.27s/it]
Training 1/2 epoch (loss 0.4004):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 329/840 [20:40<34:04,  4.00s/it]
Training 1/2 epoch (loss 0.4961):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 329/840 [20:43<34:04,  4.00s/it]
Training 1/2 epoch (loss 0.4961):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 330/840 [20:43<31:47,  3.74s/it]
Training 1/2 epoch (loss 0.4863):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 330/840 [20:46<31:47,  3.74s/it]
Training 1/2 epoch (loss 0.4863):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 331/840 [20:46<30:15,  3.57s/it]
Training 1/2 epoch (loss 0.5859):  39%|β–ˆβ–ˆβ–ˆβ–‰      | 331/840 [20:48<30:15,  3.57s/it]
Training 1/2 epoch (loss 0.5859):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 332/840 [20:48<27:50,  3.29s/it]
Training 1/2 epoch (loss 0.5859):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 332/840 [20:54<27:50,  3.29s/it]
Training 1/2 epoch (loss 0.5859):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 333/840 [20:54<33:17,  3.94s/it]
Training 1/2 epoch (loss 0.6484):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 333/840 [20:59<33:17,  3.94s/it]
Training 1/2 epoch (loss 0.6484):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 334/840 [20:59<37:23,  4.43s/it]
Training 1/2 epoch (loss 0.5078):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 334/840 [21:05<37:23,  4.43s/it]
Training 1/2 epoch (loss 0.5078):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 335/840 [21:05<40:00,  4.75s/it]
Training 1/2 epoch (loss 0.4844):  40%|β–ˆβ–ˆβ–ˆβ–‰      | 335/840 [21:09<40:00,  4.75s/it]
Training 1/2 epoch (loss 0.4844):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 336/840 [21:09<36:58,  4.40s/it]
Training 1/2 epoch (loss 0.6250):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 336/840 [21:12<36:58,  4.40s/it]
Training 1/2 epoch (loss 0.6250):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 337/840 [21:12<35:15,  4.21s/it]
Training 1/2 epoch (loss 0.5781):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 337/840 [21:15<35:15,  4.21s/it]
Training 1/2 epoch (loss 0.5781):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 338/840 [21:15<32:30,  3.89s/it]
Training 1/2 epoch (loss 0.4727):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 338/840 [21:19<32:30,  3.89s/it]
Training 1/2 epoch (loss 0.4727):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 339/840 [21:19<31:08,  3.73s/it]
Training 1/2 epoch (loss 0.5820):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 339/840 [21:23<31:08,  3.73s/it]
Training 1/2 epoch (loss 0.5820):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 340/840 [21:23<31:56,  3.83s/it]
Training 1/2 epoch (loss 0.4844):  40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 340/840 [21:26<31:56,  3.83s/it]
Training 1/2 epoch (loss 0.4844):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 341/840 [21:26<30:07,  3.62s/it]
Training 1/2 epoch (loss 0.5938):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 341/840 [21:30<30:07,  3.62s/it]
Training 1/2 epoch (loss 0.5938):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 342/840 [21:30<30:40,  3.69s/it]
Training 1/2 epoch (loss 0.6602):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 342/840 [21:33<30:40,  3.69s/it]
Training 1/2 epoch (loss 0.6602):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 343/840 [21:33<28:57,  3.50s/it]
Training 1/2 epoch (loss 0.5352):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 343/840 [21:38<28:57,  3.50s/it]
Training 1/2 epoch (loss 0.5352):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 344/840 [21:38<31:36,  3.82s/it]
Training 1/2 epoch (loss 0.6094):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 344/840 [21:40<31:36,  3.82s/it]
Training 1/2 epoch (loss 0.6094):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 345/840 [21:40<28:35,  3.47s/it]
Training 1/2 epoch (loss 0.5156):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 345/840 [21:43<28:35,  3.47s/it]
Training 1/2 epoch (loss 0.5156):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 346/840 [21:43<26:27,  3.21s/it]
Training 1/2 epoch (loss 0.5039):  41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 346/840 [21:45<26:27,  3.21s/it]
Training 1/2 epoch (loss 0.5039):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 347/840 [21:45<25:09,  3.06s/it]
Training 1/2 epoch (loss 0.4668):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 347/840 [21:49<25:09,  3.06s/it]
Training 1/2 epoch (loss 0.4668):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 348/840 [21:49<25:40,  3.13s/it]
Training 1/2 epoch (loss 0.5781):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 348/840 [21:53<25:40,  3.13s/it]
Training 1/2 epoch (loss 0.5781):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 349/840 [21:53<27:48,  3.40s/it]
Training 1/2 epoch (loss 0.4492):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 349/840 [21:56<27:48,  3.40s/it]
Training 1/2 epoch (loss 0.4492):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 350/840 [21:56<26:15,  3.22s/it]
Training 1/2 epoch (loss 0.4473):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 350/840 [22:01<26:15,  3.22s/it]
Training 1/2 epoch (loss 0.4473):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 351/840 [22:01<31:38,  3.88s/it]
Training 1/2 epoch (loss 0.4844):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 351/840 [22:04<31:38,  3.88s/it]
Training 1/2 epoch (loss 0.4844):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 352/840 [22:04<29:36,  3.64s/it]
Training 1/2 epoch (loss 0.4922):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 352/840 [22:08<29:36,  3.64s/it]
Training 1/2 epoch (loss 0.4922):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 353/840 [22:08<29:36,  3.65s/it]
Training 1/2 epoch (loss 0.5430):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 353/840 [22:11<29:36,  3.65s/it]
Training 1/2 epoch (loss 0.5430):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 354/840 [22:11<29:15,  3.61s/it]
Training 1/2 epoch (loss 0.5469):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 354/840 [22:14<29:15,  3.61s/it]
Training 1/2 epoch (loss 0.5469):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 355/840 [22:14<28:07,  3.48s/it]
Training 1/2 epoch (loss 0.5039):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 355/840 [22:18<28:07,  3.48s/it]
Training 1/2 epoch (loss 0.5039):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 356/840 [22:18<29:20,  3.64s/it]
Training 1/2 epoch (loss 0.4473):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 356/840 [22:23<29:20,  3.64s/it]
Training 1/2 epoch (loss 0.4473):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 357/840 [22:23<32:16,  4.01s/it]
Training 1/2 epoch (loss 0.5156):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 357/840 [22:28<32:16,  4.01s/it]
Training 1/2 epoch (loss 0.5156):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 358/840 [22:28<33:35,  4.18s/it]
Training 1/2 epoch (loss 0.4766):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 358/840 [22:33<33:35,  4.18s/it]
Training 1/2 epoch (loss 0.4766):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 359/840 [22:33<36:34,  4.56s/it]
Training 1/2 epoch (loss 0.5312):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 359/840 [22:37<36:34,  4.56s/it]
Training 1/2 epoch (loss 0.5312):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 360/840 [22:37<35:15,  4.41s/it]
Training 1/2 epoch (loss 0.4395):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 360/840 [22:41<35:15,  4.41s/it]
Training 1/2 epoch (loss 0.4395):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 361/840 [22:41<34:10,  4.28s/it]
Training 1/2 epoch (loss 0.4766):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 361/840 [22:44<34:10,  4.28s/it]
Training 1/2 epoch (loss 0.4766):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 362/840 [22:44<30:47,  3.86s/it]
Training 1/2 epoch (loss 0.4785):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 362/840 [22:48<30:47,  3.86s/it]
Training 1/2 epoch (loss 0.4785):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 363/840 [22:48<29:54,  3.76s/it]
Training 1/2 epoch (loss 0.4141):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 363/840 [22:53<29:54,  3.76s/it]
Training 1/2 epoch (loss 0.4141):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 364/840 [22:53<33:02,  4.17s/it]
Training 1/2 epoch (loss 0.5547):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 364/840 [22:57<33:02,  4.17s/it]
Training 1/2 epoch (loss 0.5547):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 365/840 [22:57<31:44,  4.01s/it]
Training 1/2 epoch (loss 0.5703):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 365/840 [23:02<31:44,  4.01s/it]
Training 1/2 epoch (loss 0.5703):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 366/840 [23:02<35:11,  4.45s/it]
Training 1/2 epoch (loss 0.3711):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 366/840 [23:06<35:11,  4.45s/it]
Training 1/2 epoch (loss 0.3711):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 367/840 [23:06<32:59,  4.18s/it]
Training 1/2 epoch (loss 0.5078):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 367/840 [23:10<32:59,  4.18s/it]
Training 1/2 epoch (loss 0.5078):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 368/840 [23:10<33:55,  4.31s/it]
Training 1/2 epoch (loss 0.6055):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 368/840 [23:13<33:55,  4.31s/it]
Training 1/2 epoch (loss 0.6055):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 369/840 [23:13<31:07,  3.97s/it]
Training 1/2 epoch (loss 0.7266):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 369/840 [23:19<31:07,  3.97s/it]
Training 1/2 epoch (loss 0.7266):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 370/840 [23:19<34:43,  4.43s/it]
Training 1/2 epoch (loss 0.6250):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 370/840 [23:24<34:43,  4.43s/it]
Training 1/2 epoch (loss 0.6250):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 371/840 [23:24<37:04,  4.74s/it]
Training 1/2 epoch (loss 0.3438):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 371/840 [23:30<37:04,  4.74s/it]
Training 1/2 epoch (loss 0.3438):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 372/840 [23:30<38:45,  4.97s/it]
Training 1/2 epoch (loss 0.4219):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 372/840 [23:33<38:45,  4.97s/it]
Training 1/2 epoch (loss 0.4219):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 373/840 [23:33<35:04,  4.51s/it]
Training 1/2 epoch (loss 0.4883):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 373/840 [23:37<35:04,  4.51s/it]
Training 1/2 epoch (loss 0.4883):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 374/840 [23:37<32:47,  4.22s/it]
Training 1/2 epoch (loss 0.5078):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 374/840 [23:41<32:47,  4.22s/it]
Training 1/2 epoch (loss 0.5078):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 375/840 [23:41<33:26,  4.32s/it]
Training 1/2 epoch (loss 0.5469):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 375/840 [23:46<33:26,  4.32s/it]
Training 1/2 epoch (loss 0.5469):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 376/840 [23:46<34:40,  4.48s/it]
Training 1/2 epoch (loss 0.4297):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 376/840 [23:50<34:40,  4.48s/it]
Training 1/2 epoch (loss 0.4297):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 377/840 [23:50<33:26,  4.33s/it]
Training 1/2 epoch (loss 0.4062):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 377/840 [23:54<33:26,  4.33s/it]
Training 1/2 epoch (loss 0.4062):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 378/840 [23:54<32:04,  4.17s/it]
Training 1/2 epoch (loss 0.4961):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 378/840 [23:57<32:04,  4.17s/it]
Training 1/2 epoch (loss 0.4961):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 379/840 [23:57<28:38,  3.73s/it]
Training 1/2 epoch (loss 0.4512):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 379/840 [24:02<28:38,  3.73s/it]
Training 1/2 epoch (loss 0.4512):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 380/840 [24:02<31:06,  4.06s/it]
Training 1/2 epoch (loss 0.5898):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 380/840 [24:04<31:06,  4.06s/it]
Training 1/2 epoch (loss 0.5898):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 381/840 [24:04<28:12,  3.69s/it]
Training 1/2 epoch (loss 0.4219):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 381/840 [24:09<28:12,  3.69s/it]
Training 1/2 epoch (loss 0.4219):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 382/840 [24:09<30:58,  4.06s/it]
Training 1/2 epoch (loss 0.5312):  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 382/840 [24:12<30:58,  4.06s/it]
Training 1/2 epoch (loss 0.5312):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 383/840 [24:12<28:55,  3.80s/it]
Training 1/2 epoch (loss 0.4570):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 383/840 [24:16<28:55,  3.80s/it]
Training 1/2 epoch (loss 0.4570):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 384/840 [24:16<28:24,  3.74s/it]
Training 1/2 epoch (loss 0.4922):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 384/840 [24:19<28:24,  3.74s/it]
Training 1/2 epoch (loss 0.4922):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 385/840 [24:19<26:48,  3.54s/it]
Training 1/2 epoch (loss 0.5703):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 385/840 [24:22<26:48,  3.54s/it]
Training 1/2 epoch (loss 0.5703):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 386/840 [24:22<25:39,  3.39s/it]
Training 1/2 epoch (loss 0.5859):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 386/840 [24:25<25:39,  3.39s/it]
Training 1/2 epoch (loss 0.5859):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 387/840 [24:25<24:14,  3.21s/it]
Training 1/2 epoch (loss 0.5078):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 387/840 [24:29<24:14,  3.21s/it]
Training 1/2 epoch (loss 0.5078):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 388/840 [24:29<26:35,  3.53s/it]
Training 1/2 epoch (loss 0.4219):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 388/840 [24:32<26:35,  3.53s/it]
Training 1/2 epoch (loss 0.4219):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 389/840 [24:32<24:53,  3.31s/it]
Training 1/2 epoch (loss 0.3906):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 389/840 [24:35<24:53,  3.31s/it]
Training 1/2 epoch (loss 0.3906):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 390/840 [24:35<24:15,  3.23s/it]
Training 1/2 epoch (loss 0.5312):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 390/840 [24:38<24:15,  3.23s/it]
Training 1/2 epoch (loss 0.5312):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 391/840 [24:38<24:00,  3.21s/it]
Training 1/2 epoch (loss 0.4570):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 391/840 [24:41<24:00,  3.21s/it]
Training 1/2 epoch (loss 0.4570):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 392/840 [24:41<22:15,  2.98s/it]
Training 1/2 epoch (loss 0.5703):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 392/840 [24:44<22:15,  2.98s/it]
Training 1/2 epoch (loss 0.5703):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 393/840 [24:44<23:22,  3.14s/it]
Training 1/2 epoch (loss 0.5117):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 393/840 [24:47<23:22,  3.14s/it]
Training 1/2 epoch (loss 0.5117):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 394/840 [24:47<23:18,  3.14s/it]
Training 1/2 epoch (loss 0.4570):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 394/840 [24:50<23:18,  3.14s/it]
Training 1/2 epoch (loss 0.4570):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 395/840 [24:50<22:05,  2.98s/it]
Training 1/2 epoch (loss 0.4512):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 395/840 [24:53<22:05,  2.98s/it]
Training 1/2 epoch (loss 0.4512):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 396/840 [24:53<22:55,  3.10s/it]
Training 1/2 epoch (loss 0.4414):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 396/840 [24:57<22:55,  3.10s/it]
Training 1/2 epoch (loss 0.4414):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 397/840 [24:57<24:40,  3.34s/it]
Training 1/2 epoch (loss 0.4180):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 397/840 [25:01<24:40,  3.34s/it]
Training 1/2 epoch (loss 0.4180):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 398/840 [25:01<24:47,  3.37s/it]
Training 1/2 epoch (loss 0.6211):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 398/840 [25:06<24:47,  3.37s/it]
Training 1/2 epoch (loss 0.6211):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 399/840 [25:06<29:22,  4.00s/it]
Training 1/2 epoch (loss 0.5000):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 399/840 [25:12<29:22,  4.00s/it]
Training 1/2 epoch (loss 0.5000):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 400/840 [25:12<32:51,  4.48s/it]
Training 1/2 epoch (loss 0.4629):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 400/840 [25:16<32:51,  4.48s/it]
Training 1/2 epoch (loss 0.4629):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 401/840 [25:16<31:37,  4.32s/it]
Training 1/2 epoch (loss 0.4375):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 401/840 [25:21<31:37,  4.32s/it]
Training 1/2 epoch (loss 0.4375):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 402/840 [25:21<34:13,  4.69s/it]
Training 1/2 epoch (loss 0.4863):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 402/840 [25:27<34:13,  4.69s/it]
Training 1/2 epoch (loss 0.4863):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 403/840 [25:27<35:50,  4.92s/it]
Training 1/2 epoch (loss 0.4688):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 403/840 [25:30<35:50,  4.92s/it]
Training 1/2 epoch (loss 0.4688):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 404/840 [25:30<31:37,  4.35s/it]
Training 1/2 epoch (loss 0.5898):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 404/840 [25:32<31:37,  4.35s/it]
Training 1/2 epoch (loss 0.5898):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 405/840 [25:32<27:29,  3.79s/it]
Training 1/2 epoch (loss 0.5586):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 405/840 [25:37<27:29,  3.79s/it]
Training 1/2 epoch (loss 0.5586):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 406/840 [25:37<29:17,  4.05s/it]
Training 1/2 epoch (loss 0.5078):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 406/840 [25:42<29:17,  4.05s/it]
Training 1/2 epoch (loss 0.5078):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 407/840 [25:42<30:35,  4.24s/it]
Training 1/2 epoch (loss 0.5000):  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 407/840 [25:46<30:35,  4.24s/it]
Training 1/2 epoch (loss 0.5000):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 408/840 [25:46<31:43,  4.41s/it]
Training 1/2 epoch (loss 0.4844):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 408/840 [25:49<31:43,  4.41s/it]
Training 1/2 epoch (loss 0.4844):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 409/840 [25:49<27:51,  3.88s/it]
Training 1/2 epoch (loss 0.4004):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 409/840 [25:55<27:51,  3.88s/it]
Training 1/2 epoch (loss 0.4004):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 410/840 [25:55<31:27,  4.39s/it]
Training 1/2 epoch (loss 0.5234):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 410/840 [25:58<31:27,  4.39s/it]
Training 1/2 epoch (loss 0.5234):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 411/840 [25:58<28:14,  3.95s/it]
Training 1/2 epoch (loss 0.4961):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 411/840 [26:01<28:14,  3.95s/it]
Training 1/2 epoch (loss 0.4961):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 412/840 [26:01<26:42,  3.74s/it]
Training 1/2 epoch (loss 0.4492):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 412/840 [26:04<26:42,  3.74s/it]
Training 1/2 epoch (loss 0.4492):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 413/840 [26:04<25:13,  3.54s/it]
Training 1/2 epoch (loss 0.4668):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 413/840 [26:08<25:13,  3.54s/it]
Training 1/2 epoch (loss 0.4668):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 414/840 [26:08<26:55,  3.79s/it]
Training 1/2 epoch (loss 0.4316):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 414/840 [26:12<26:55,  3.79s/it]
Training 1/2 epoch (loss 0.4316):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 415/840 [26:12<27:47,  3.92s/it]
Training 1/2 epoch (loss 0.5195):  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 415/840 [26:17<27:47,  3.92s/it]
Training 1/2 epoch (loss 0.5195):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 416/840 [26:17<29:24,  4.16s/it]
Training 1/2 epoch (loss 0.4922):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 416/840 [26:21<29:24,  4.16s/it]
Training 1/2 epoch (loss 0.4922):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 417/840 [26:21<27:36,  3.92s/it]
Training 1/2 epoch (loss 0.4082):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 417/840 [26:24<27:36,  3.92s/it]
Training 1/2 epoch (loss 0.4082):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 418/840 [26:24<27:16,  3.88s/it]
Training 1/2 epoch (loss 0.5820):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 418/840 [26:27<27:16,  3.88s/it]
Training 1/2 epoch (loss 0.5820):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 419/840 [26:27<25:34,  3.64s/it]
Training 1/2 epoch (loss 0.4297):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 419/840 [26:31<25:34,  3.64s/it]
Training 1/2 epoch (loss 0.4297):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 420/840 [26:31<25:35,  3.66s/it]
Training 2/2 epoch (loss 0.5664):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 420/840 [26:34<25:35,  3.66s/it]
Training 2/2 epoch (loss 0.5664):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 421/840 [26:34<23:03,  3.30s/it]
Training 2/2 epoch (loss 0.4414):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 421/840 [26:37<23:03,  3.30s/it]
Training 2/2 epoch (loss 0.4414):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 422/840 [26:37<23:57,  3.44s/it]
Training 2/2 epoch (loss 0.4531):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 422/840 [26:40<23:57,  3.44s/it]
Training 2/2 epoch (loss 0.4531):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 423/840 [26:40<22:54,  3.30s/it]
Training 2/2 epoch (loss 0.4219):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 423/840 [26:43<22:54,  3.30s/it]
Training 2/2 epoch (loss 0.4219):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 424/840 [26:43<22:08,  3.19s/it]
Training 2/2 epoch (loss 0.4805):  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 424/840 [26:48<22:08,  3.19s/it]
Training 2/2 epoch (loss 0.4805):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 425/840 [26:48<24:47,  3.58s/it]
Training 2/2 epoch (loss 0.4824):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 425/840 [26:51<24:47,  3.58s/it]
Training 2/2 epoch (loss 0.4824):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 426/840 [26:51<24:30,  3.55s/it]
Training 2/2 epoch (loss 0.5469):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 426/840 [26:55<24:30,  3.55s/it]
Training 2/2 epoch (loss 0.5469):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 427/840 [26:55<25:04,  3.64s/it]
Training 2/2 epoch (loss 0.4316):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 427/840 [27:01<25:04,  3.64s/it]
Training 2/2 epoch (loss 0.4316):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 428/840 [27:01<28:56,  4.22s/it]
Training 2/2 epoch (loss 0.5508):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 428/840 [27:05<28:56,  4.22s/it]
Training 2/2 epoch (loss 0.5508):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 429/840 [27:05<28:23,  4.14s/it]
Training 2/2 epoch (loss 0.4492):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 429/840 [27:08<28:23,  4.14s/it]
Training 2/2 epoch (loss 0.4492):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 430/840 [27:08<26:10,  3.83s/it]
Training 2/2 epoch (loss 0.6016):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 430/840 [27:10<26:10,  3.83s/it]
Training 2/2 epoch (loss 0.6016):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 431/840 [27:10<23:31,  3.45s/it]
Training 2/2 epoch (loss 0.4941):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 431/840 [27:14<23:31,  3.45s/it]
Training 2/2 epoch (loss 0.4941):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 432/840 [27:14<24:52,  3.66s/it]
Training 2/2 epoch (loss 0.5391):  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 432/840 [27:17<24:52,  3.66s/it]
Training 2/2 epoch (loss 0.5391):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 433/840 [27:17<22:42,  3.35s/it]
Training 2/2 epoch (loss 0.5312):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 433/840 [27:21<22:42,  3.35s/it]
Training 2/2 epoch (loss 0.5312):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 434/840 [27:21<23:05,  3.41s/it]
Training 2/2 epoch (loss 0.3555):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 434/840 [27:23<23:05,  3.41s/it]
Training 2/2 epoch (loss 0.3555):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 435/840 [27:23<22:00,  3.26s/it]
Training 2/2 epoch (loss 0.4941):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 435/840 [27:27<22:00,  3.26s/it]
Training 2/2 epoch (loss 0.4941):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 436/840 [27:27<22:26,  3.33s/it]
Training 2/2 epoch (loss 0.6172):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 436/840 [27:30<22:26,  3.33s/it]
Training 2/2 epoch (loss 0.6172):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 437/840 [27:30<21:27,  3.20s/it]
Training 2/2 epoch (loss 0.4707):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 437/840 [27:33<21:27,  3.20s/it]
Training 2/2 epoch (loss 0.4707):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 438/840 [27:33<21:37,  3.23s/it]
Training 2/2 epoch (loss 0.4922):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 438/840 [27:37<21:37,  3.23s/it]
Training 2/2 epoch (loss 0.4922):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 439/840 [27:37<23:33,  3.53s/it]
Training 2/2 epoch (loss 0.5039):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 439/840 [27:41<23:33,  3.53s/it]
Training 2/2 epoch (loss 0.5039):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 440/840 [27:41<24:07,  3.62s/it]
Training 2/2 epoch (loss 0.5352):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 440/840 [27:45<24:07,  3.62s/it]
Training 2/2 epoch (loss 0.5352):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 441/840 [27:45<24:06,  3.62s/it]
Training 2/2 epoch (loss 0.5273):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 441/840 [27:48<24:06,  3.62s/it]
Training 2/2 epoch (loss 0.5273):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 442/840 [27:48<22:18,  3.36s/it]
Training 2/2 epoch (loss 0.4746):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 442/840 [27:50<22:18,  3.36s/it]
Training 2/2 epoch (loss 0.4746):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 443/840 [27:50<20:57,  3.17s/it]
Training 2/2 epoch (loss 0.4648):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 443/840 [27:54<20:57,  3.17s/it]
Training 2/2 epoch (loss 0.4648):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 444/840 [27:54<22:08,  3.36s/it]
Training 2/2 epoch (loss 0.4609):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 444/840 [27:59<22:08,  3.36s/it]
Training 2/2 epoch (loss 0.4609):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 445/840 [27:59<24:50,  3.77s/it]
Training 2/2 epoch (loss 0.4258):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 445/840 [28:04<24:50,  3.77s/it]
Training 2/2 epoch (loss 0.4258):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 446/840 [28:04<27:11,  4.14s/it]
Training 2/2 epoch (loss 0.3730):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 446/840 [28:07<27:11,  4.14s/it]
Training 2/2 epoch (loss 0.3730):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 447/840 [28:07<25:43,  3.93s/it]
Training 2/2 epoch (loss 0.3203):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 447/840 [28:11<25:43,  3.93s/it]
Training 2/2 epoch (loss 0.3203):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 448/840 [28:11<24:31,  3.75s/it]
Training 2/2 epoch (loss 0.3047):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 448/840 [28:13<24:31,  3.75s/it]
Training 2/2 epoch (loss 0.3047):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 449/840 [28:13<22:02,  3.38s/it]
Training 2/2 epoch (loss 0.3945):  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 449/840 [28:18<22:02,  3.38s/it]
Training 2/2 epoch (loss 0.3945):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 450/840 [28:18<25:03,  3.86s/it]
Training 2/2 epoch (loss 0.4473):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 450/840 [28:23<25:03,  3.86s/it]
Training 2/2 epoch (loss 0.4473):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 451/840 [28:23<26:28,  4.08s/it]
Training 2/2 epoch (loss 0.4277):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 451/840 [28:26<26:28,  4.08s/it]
Training 2/2 epoch (loss 0.4277):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 452/840 [28:26<25:18,  3.91s/it]
Training 2/2 epoch (loss 0.3359):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 452/840 [28:32<25:18,  3.91s/it]
Training 2/2 epoch (loss 0.3359):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 453/840 [28:32<28:06,  4.36s/it]
Training 2/2 epoch (loss 0.2773):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 453/840 [28:35<28:06,  4.36s/it]
Training 2/2 epoch (loss 0.2773):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 454/840 [28:35<26:16,  4.08s/it]
Training 2/2 epoch (loss 0.2598):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 454/840 [28:40<26:16,  4.08s/it]
Training 2/2 epoch (loss 0.2598):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 455/840 [28:40<27:08,  4.23s/it]
Training 2/2 epoch (loss 0.3320):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 455/840 [28:43<27:08,  4.23s/it]
Training 2/2 epoch (loss 0.3320):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 456/840 [28:43<25:50,  4.04s/it]
Training 2/2 epoch (loss 0.2656):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 456/840 [28:46<25:50,  4.04s/it]
Training 2/2 epoch (loss 0.2656):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 457/840 [28:46<24:06,  3.78s/it]
Training 2/2 epoch (loss 0.3066):  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 457/840 [28:50<24:06,  3.78s/it]
Training 2/2 epoch (loss 0.3066):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 458/840 [28:50<22:49,  3.58s/it]
Training 2/2 epoch (loss 0.2559):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 458/840 [28:54<22:49,  3.58s/it]
Training 2/2 epoch (loss 0.2559):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 459/840 [28:54<24:53,  3.92s/it]
Training 2/2 epoch (loss 0.1855):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 459/840 [28:59<24:53,  3.92s/it]
Training 2/2 epoch (loss 0.1855):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 460/840 [28:59<27:09,  4.29s/it]
Training 2/2 epoch (loss 0.2266):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 460/840 [29:04<27:09,  4.29s/it]
Training 2/2 epoch (loss 0.2266):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 461/840 [29:04<27:15,  4.31s/it]
Training 2/2 epoch (loss 0.1865):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 461/840 [29:07<27:15,  4.31s/it]
Training 2/2 epoch (loss 0.1865):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 462/840 [29:07<25:49,  4.10s/it]
Training 2/2 epoch (loss 0.2949):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 462/840 [29:12<25:49,  4.10s/it]
Training 2/2 epoch (loss 0.2949):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 463/840 [29:12<26:23,  4.20s/it]
Training 2/2 epoch (loss 0.4043):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 463/840 [29:16<26:23,  4.20s/it]
Training 2/2 epoch (loss 0.4043):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 464/840 [29:16<26:14,  4.19s/it]
Training 2/2 epoch (loss 0.2559):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 464/840 [29:19<26:14,  4.19s/it]
Training 2/2 epoch (loss 0.2559):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 465/840 [29:19<23:08,  3.70s/it]
Training 2/2 epoch (loss 0.3398):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 465/840 [29:24<23:08,  3.70s/it]
Training 2/2 epoch (loss 0.3398):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 466/840 [29:24<26:36,  4.27s/it]
Training 2/2 epoch (loss 0.3789):  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 466/840 [29:28<26:36,  4.27s/it]
Training 2/2 epoch (loss 0.3789):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 467/840 [29:28<25:31,  4.11s/it]
Training 2/2 epoch (loss 0.1738):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 467/840 [29:31<25:31,  4.11s/it]
Training 2/2 epoch (loss 0.1738):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 468/840 [29:31<23:29,  3.79s/it]
Training 2/2 epoch (loss 0.5859):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 468/840 [29:34<23:29,  3.79s/it]
Training 2/2 epoch (loss 0.5859):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 469/840 [29:34<21:13,  3.43s/it]
Training 2/2 epoch (loss 0.4824):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 469/840 [29:36<21:13,  3.43s/it]
Training 2/2 epoch (loss 0.4824):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 470/840 [29:36<19:50,  3.22s/it]
Training 2/2 epoch (loss 0.4863):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 470/840 [29:40<19:50,  3.22s/it]
Training 2/2 epoch (loss 0.4863):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 471/840 [29:40<21:26,  3.49s/it]
Training 2/2 epoch (loss 0.3633):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 471/840 [29:44<21:26,  3.49s/it]
Training 2/2 epoch (loss 0.3633):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 472/840 [29:44<20:53,  3.41s/it]
Training 2/2 epoch (loss 0.1660):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 472/840 [29:46<20:53,  3.41s/it]
Training 2/2 epoch (loss 0.1660):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 473/840 [29:46<19:55,  3.26s/it]
Training 2/2 epoch (loss 0.2852):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 473/840 [29:50<19:55,  3.26s/it]
Training 2/2 epoch (loss 0.2852):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 474/840 [29:50<19:38,  3.22s/it]
Training 2/2 epoch (loss 0.2109):  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 474/840 [29:53<19:38,  3.22s/it]
Training 2/2 epoch (loss 0.2109):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 475/840 [29:53<19:59,  3.29s/it]
Training 2/2 epoch (loss 0.3262):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 475/840 [29:56<19:59,  3.29s/it]
Training 2/2 epoch (loss 0.3262):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 476/840 [29:56<19:39,  3.24s/it]
Training 2/2 epoch (loss 0.1973):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 476/840 [29:59<19:39,  3.24s/it]
Training 2/2 epoch (loss 0.1973):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 477/840 [29:59<19:10,  3.17s/it]
Training 2/2 epoch (loss 0.1631):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 477/840 [30:03<19:10,  3.17s/it]
Training 2/2 epoch (loss 0.1631):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 478/840 [30:03<19:47,  3.28s/it]
Training 2/2 epoch (loss 0.2734):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 478/840 [30:07<19:47,  3.28s/it]
Training 2/2 epoch (loss 0.2734):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 479/840 [30:07<22:11,  3.69s/it]
Training 2/2 epoch (loss 0.2119):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 479/840 [30:13<22:11,  3.69s/it]
Training 2/2 epoch (loss 0.2119):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 480/840 [30:13<25:15,  4.21s/it]
Training 2/2 epoch (loss 0.1836):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 480/840 [30:16<25:15,  4.21s/it]
Training 2/2 epoch (loss 0.1836):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 481/840 [30:16<23:22,  3.91s/it]
Training 2/2 epoch (loss 0.3164):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 481/840 [30:20<23:22,  3.91s/it]
Training 2/2 epoch (loss 0.3164):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 482/840 [30:20<23:15,  3.90s/it]
Training 2/2 epoch (loss 0.3164):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 482/840 [30:25<23:15,  3.90s/it]
Training 2/2 epoch (loss 0.3164):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 483/840 [30:25<25:53,  4.35s/it]
Training 2/2 epoch (loss 0.3926):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 483/840 [30:30<25:53,  4.35s/it]
Training 2/2 epoch (loss 0.3926):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 484/840 [30:30<26:14,  4.42s/it]
Training 2/2 epoch (loss 0.1250):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 484/840 [30:33<26:14,  4.42s/it]
Training 2/2 epoch (loss 0.1250):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 485/840 [30:33<23:21,  3.95s/it]
Training 2/2 epoch (loss 0.2363):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 485/840 [30:36<23:21,  3.95s/it]
Training 2/2 epoch (loss 0.2363):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 486/840 [30:36<21:15,  3.60s/it]
Training 2/2 epoch (loss 0.2656):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 486/840 [30:40<21:15,  3.60s/it]
Training 2/2 epoch (loss 0.2656):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 487/840 [30:40<21:56,  3.73s/it]
Training 2/2 epoch (loss 0.2578):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 487/840 [30:44<21:56,  3.73s/it]
Training 2/2 epoch (loss 0.2578):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 488/840 [30:44<22:53,  3.90s/it]
Training 2/2 epoch (loss 0.2305):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 488/840 [30:46<22:53,  3.90s/it]
Training 2/2 epoch (loss 0.2305):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 489/840 [30:46<20:34,  3.52s/it]
Training 2/2 epoch (loss 0.1226):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 489/840 [30:51<20:34,  3.52s/it]
Training 2/2 epoch (loss 0.1226):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 490/840 [30:51<21:39,  3.71s/it]
Training 2/2 epoch (loss 0.1279):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 490/840 [30:54<21:39,  3.71s/it]
Training 2/2 epoch (loss 0.1279):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 491/840 [30:54<20:55,  3.60s/it]
Training 2/2 epoch (loss 0.1582):  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 491/840 [31:00<20:55,  3.60s/it]
Training 2/2 epoch (loss 0.1582):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 492/840 [31:00<24:20,  4.20s/it]
Training 2/2 epoch (loss 0.1992):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 492/840 [31:05<24:20,  4.20s/it]
Training 2/2 epoch (loss 0.1992):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 493/840 [31:05<26:23,  4.56s/it]
Training 2/2 epoch (loss 0.2637):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 493/840 [31:10<26:23,  4.56s/it]
Training 2/2 epoch (loss 0.2637):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 494/840 [31:10<26:30,  4.60s/it]
Training 2/2 epoch (loss 0.2148):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 494/840 [31:15<26:30,  4.60s/it]
Training 2/2 epoch (loss 0.2148):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 495/840 [31:15<27:55,  4.86s/it]
Training 2/2 epoch (loss 0.2617):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 495/840 [31:18<27:55,  4.86s/it]
Training 2/2 epoch (loss 0.2617):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 496/840 [31:18<24:04,  4.20s/it]
Training 2/2 epoch (loss 0.1787):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 496/840 [31:21<24:04,  4.20s/it]
Training 2/2 epoch (loss 0.1787):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 497/840 [31:21<22:12,  3.89s/it]
Training 2/2 epoch (loss 0.1924):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 497/840 [31:24<22:12,  3.89s/it]
Training 2/2 epoch (loss 0.1924):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 498/840 [31:24<20:14,  3.55s/it]
Training 2/2 epoch (loss 0.1426):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 498/840 [31:28<20:14,  3.55s/it]
Training 2/2 epoch (loss 0.1426):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 499/840 [31:28<21:42,  3.82s/it]
Training 2/2 epoch (loss 0.1050):  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 499/840 [31:31<21:42,  3.82s/it]
Training 2/2 epoch (loss 0.1050):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 500/840 [31:31<20:32,  3.62s/it]
Training 2/2 epoch (loss 0.1982):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 500/840 [31:35<20:32,  3.62s/it]
Training 2/2 epoch (loss 0.1982):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 501/840 [31:35<19:59,  3.54s/it]
Training 2/2 epoch (loss 0.1118):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 501/840 [31:38<19:59,  3.54s/it]
Training 2/2 epoch (loss 0.1118):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 502/840 [31:38<19:31,  3.47s/it]
Training 2/2 epoch (loss 0.1621):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 502/840 [31:42<19:31,  3.47s/it]
Training 2/2 epoch (loss 0.1621):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 503/840 [31:42<21:04,  3.75s/it]
Training 2/2 epoch (loss 0.1484):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 503/840 [31:46<21:04,  3.75s/it]
Training 2/2 epoch (loss 0.1484):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 504/840 [31:46<21:13,  3.79s/it]
Training 2/2 epoch (loss 0.4121):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 504/840 [31:49<21:13,  3.79s/it]
Training 2/2 epoch (loss 0.4121):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 505/840 [31:49<19:41,  3.53s/it]
Training 2/2 epoch (loss 0.2295):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 505/840 [31:52<19:41,  3.53s/it]
Training 2/2 epoch (loss 0.2295):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 506/840 [31:52<18:21,  3.30s/it]
Training 2/2 epoch (loss 0.2002):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 506/840 [31:57<18:21,  3.30s/it]
Training 2/2 epoch (loss 0.2002):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 507/840 [31:57<21:46,  3.92s/it]
Training 2/2 epoch (loss 0.1758):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 507/840 [32:01<21:46,  3.92s/it]
Training 2/2 epoch (loss 0.1758):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 508/840 [32:01<21:58,  3.97s/it]
Training 2/2 epoch (loss 0.1943):  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 508/840 [32:05<21:58,  3.97s/it]
Training 2/2 epoch (loss 0.1943):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 509/840 [32:05<21:03,  3.82s/it]
Training 2/2 epoch (loss 0.1816):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 509/840 [32:08<21:03,  3.82s/it]
Training 2/2 epoch (loss 0.1816):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 510/840 [32:08<20:25,  3.71s/it]
Training 2/2 epoch (loss 0.2637):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 510/840 [32:11<20:25,  3.71s/it]
Training 2/2 epoch (loss 0.2637):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 511/840 [32:11<18:45,  3.42s/it]
Training 2/2 epoch (loss 0.3730):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 511/840 [32:14<18:45,  3.42s/it]
Training 2/2 epoch (loss 0.3730):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 512/840 [32:14<18:16,  3.34s/it]
Training 2/2 epoch (loss 0.1030):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 512/840 [32:17<18:16,  3.34s/it]
Training 2/2 epoch (loss 0.1030):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 513/840 [32:17<17:15,  3.17s/it]
Training 2/2 epoch (loss 0.1602):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 513/840 [32:20<17:15,  3.17s/it]
Training 2/2 epoch (loss 0.1602):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 514/840 [32:20<17:17,  3.18s/it]
Training 2/2 epoch (loss 0.1680):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 514/840 [32:24<17:17,  3.18s/it]
Training 2/2 epoch (loss 0.1680):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 515/840 [32:24<18:14,  3.37s/it]
Training 2/2 epoch (loss 0.1016):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 515/840 [32:30<18:14,  3.37s/it]
Training 2/2 epoch (loss 0.1016):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 516/840 [32:30<21:48,  4.04s/it]
Training 2/2 epoch (loss 0.0908):  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 516/840 [32:34<21:48,  4.04s/it]
Training 2/2 epoch (loss 0.0908):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 517/840 [32:34<21:42,  4.03s/it]
Training 2/2 epoch (loss 0.1963):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 517/840 [32:38<21:42,  4.03s/it]
Training 2/2 epoch (loss 0.1963):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 518/840 [32:38<21:52,  4.07s/it]
Training 2/2 epoch (loss 0.1475):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 518/840 [32:41<21:52,  4.07s/it]
Training 2/2 epoch (loss 0.1475):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 519/840 [32:41<19:49,  3.71s/it]
Training 2/2 epoch (loss 0.1396):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 519/840 [32:44<19:49,  3.71s/it]
Training 2/2 epoch (loss 0.1396):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 520/840 [32:44<18:43,  3.51s/it]
Training 2/2 epoch (loss 0.1396):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 520/840 [32:49<18:43,  3.51s/it]
Training 2/2 epoch (loss 0.1396):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 521/840 [32:49<21:46,  4.10s/it]
Training 2/2 epoch (loss 0.2754):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 521/840 [32:54<21:46,  4.10s/it]
Training 2/2 epoch (loss 0.2754):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 522/840 [32:54<22:23,  4.22s/it]
Training 2/2 epoch (loss 0.2168):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 522/840 [32:57<22:23,  4.22s/it]
Training 2/2 epoch (loss 0.2168):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 523/840 [32:57<20:29,  3.88s/it]
Training 2/2 epoch (loss 0.2100):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 523/840 [33:00<20:29,  3.88s/it]
Training 2/2 epoch (loss 0.2100):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 524/840 [33:00<20:01,  3.80s/it]
Training 2/2 epoch (loss 0.2090):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 524/840 [33:04<20:01,  3.80s/it]
Training 2/2 epoch (loss 0.2090):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 525/840 [33:04<20:03,  3.82s/it]
Training 2/2 epoch (loss 0.3457):  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 525/840 [33:09<20:03,  3.82s/it]
Training 2/2 epoch (loss 0.3457):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 526/840 [33:09<21:05,  4.03s/it]
Training 2/2 epoch (loss 0.1553):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 526/840 [33:14<21:05,  4.03s/it]
Training 2/2 epoch (loss 0.1553):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 527/840 [33:14<23:07,  4.43s/it]
Training 2/2 epoch (loss 0.1738):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 527/840 [33:17<23:07,  4.43s/it]
Training 2/2 epoch (loss 0.1738):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 528/840 [33:17<21:12,  4.08s/it]
Training 2/2 epoch (loss 0.1709):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 528/840 [33:21<21:12,  4.08s/it]
Training 2/2 epoch (loss 0.1709):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 529/840 [33:21<19:40,  3.80s/it]
Training 2/2 epoch (loss 0.1895):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 529/840 [33:26<19:40,  3.80s/it]
Training 2/2 epoch (loss 0.1895):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 530/840 [33:26<22:10,  4.29s/it]
Training 2/2 epoch (loss 0.3242):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 530/840 [33:29<22:10,  4.29s/it]
Training 2/2 epoch (loss 0.3242):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 531/840 [33:29<19:33,  3.80s/it]
Training 2/2 epoch (loss 0.2256):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 531/840 [33:34<19:33,  3.80s/it]
Training 2/2 epoch (loss 0.2256):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 532/840 [33:34<22:07,  4.31s/it]
Training 2/2 epoch (loss 0.3926):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 532/840 [33:40<22:07,  4.31s/it]
Training 2/2 epoch (loss 0.3926):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 533/840 [33:40<23:48,  4.65s/it]
Training 2/2 epoch (loss 0.2246):  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 533/840 [33:43<23:48,  4.65s/it]
Training 2/2 epoch (loss 0.2246):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 534/840 [33:43<21:21,  4.19s/it]
Training 2/2 epoch (loss 0.4336):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 534/840 [33:46<21:21,  4.19s/it]
Training 2/2 epoch (loss 0.4336):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 535/840 [33:46<20:31,  4.04s/it]
Training 2/2 epoch (loss 0.2637):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 535/840 [33:50<20:31,  4.04s/it]
Training 2/2 epoch (loss 0.2637):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 536/840 [33:50<19:36,  3.87s/it]
Training 2/2 epoch (loss 0.2715):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 536/840 [33:53<19:36,  3.87s/it]
Training 2/2 epoch (loss 0.2715):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 537/840 [33:53<18:33,  3.67s/it]
Training 2/2 epoch (loss 0.1426):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 537/840 [33:57<18:33,  3.67s/it]
Training 2/2 epoch (loss 0.1426):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 538/840 [33:57<19:27,  3.87s/it]
Training 2/2 epoch (loss 0.1660):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 538/840 [34:02<19:27,  3.87s/it]
Training 2/2 epoch (loss 0.1660):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 539/840 [34:02<20:38,  4.11s/it]
Training 2/2 epoch (loss 0.2246):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 539/840 [34:05<20:38,  4.11s/it]
Training 2/2 epoch (loss 0.2246):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 540/840 [34:05<19:13,  3.84s/it]
Training 2/2 epoch (loss 0.1416):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 540/840 [34:09<19:13,  3.84s/it]
Training 2/2 epoch (loss 0.1416):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 541/840 [34:09<19:11,  3.85s/it]
Training 2/2 epoch (loss 0.1367):  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 541/840 [34:13<19:11,  3.85s/it]
Training 2/2 epoch (loss 0.1367):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 542/840 [34:13<18:58,  3.82s/it]
Training 2/2 epoch (loss 0.2119):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 542/840 [34:16<18:58,  3.82s/it]
Training 2/2 epoch (loss 0.2119):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 543/840 [34:16<17:51,  3.61s/it]
Training 2/2 epoch (loss 0.1172):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 543/840 [34:20<17:51,  3.61s/it]
Training 2/2 epoch (loss 0.1172):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 544/840 [34:20<18:32,  3.76s/it]
Training 2/2 epoch (loss 0.1660):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 544/840 [34:23<18:32,  3.76s/it]
Training 2/2 epoch (loss 0.1660):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 545/840 [34:23<16:48,  3.42s/it]
Training 2/2 epoch (loss 0.1953):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 545/840 [34:26<16:48,  3.42s/it]
Training 2/2 epoch (loss 0.1953):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 546/840 [34:26<16:23,  3.34s/it]
Training 2/2 epoch (loss 0.1216):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 546/840 [34:29<16:23,  3.34s/it]
Training 2/2 epoch (loss 0.1216):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 547/840 [34:29<16:03,  3.29s/it]
Training 2/2 epoch (loss 0.3730):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 547/840 [34:35<16:03,  3.29s/it]
Training 2/2 epoch (loss 0.3730):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 548/840 [34:35<19:12,  3.95s/it]
Training 2/2 epoch (loss 0.2432):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 548/840 [34:37<19:12,  3.95s/it]
Training 2/2 epoch (loss 0.2432):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 549/840 [34:37<17:41,  3.65s/it]
Training 2/2 epoch (loss 0.2100):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 549/840 [34:41<17:41,  3.65s/it]
Training 2/2 epoch (loss 0.2100):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 550/840 [34:41<16:47,  3.47s/it]
Training 2/2 epoch (loss 0.1396):  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 550/840 [34:43<16:47,  3.47s/it]
Training 2/2 epoch (loss 0.1396):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 551/840 [34:43<15:38,  3.25s/it]
Training 2/2 epoch (loss 0.1216):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 551/840 [34:48<15:38,  3.25s/it]
Training 2/2 epoch (loss 0.1216):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 552/840 [34:48<17:54,  3.73s/it]
Training 2/2 epoch (loss 0.2656):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 552/840 [34:52<17:54,  3.73s/it]
Training 2/2 epoch (loss 0.2656):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 553/840 [34:52<18:14,  3.81s/it]
Training 2/2 epoch (loss 0.1924):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 553/840 [34:57<18:14,  3.81s/it]
Training 2/2 epoch (loss 0.1924):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 554/840 [34:57<19:05,  4.00s/it]
Training 2/2 epoch (loss 0.1299):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 554/840 [35:00<19:05,  4.00s/it]
Training 2/2 epoch (loss 0.1299):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 555/840 [35:00<17:44,  3.74s/it]
Training 2/2 epoch (loss 0.2168):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 555/840 [35:04<17:44,  3.74s/it]
Training 2/2 epoch (loss 0.2168):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 556/840 [35:04<17:52,  3.78s/it]
Training 2/2 epoch (loss 0.1455):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 556/840 [35:06<17:52,  3.78s/it]
Training 2/2 epoch (loss 0.1455):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 557/840 [35:06<16:07,  3.42s/it]
Training 2/2 epoch (loss 0.1016):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 557/840 [35:11<16:07,  3.42s/it]
Training 2/2 epoch (loss 0.1016):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 558/840 [35:11<17:52,  3.80s/it]
Training 2/2 epoch (loss 0.1484):  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 558/840 [35:15<17:52,  3.80s/it]
Training 2/2 epoch (loss 0.1484):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 559/840 [35:15<18:08,  3.88s/it]
Training 2/2 epoch (loss 0.2012):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 559/840 [35:18<18:08,  3.88s/it]
Training 2/2 epoch (loss 0.2012):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 560/840 [35:18<17:21,  3.72s/it]
Training 2/2 epoch (loss 0.0820):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 560/840 [35:23<17:21,  3.72s/it]
Training 2/2 epoch (loss 0.0820):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 561/840 [35:23<18:45,  4.03s/it]
Training 2/2 epoch (loss 0.1328):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 561/840 [35:29<18:45,  4.03s/it]
Training 2/2 epoch (loss 0.1328):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 562/840 [35:29<20:44,  4.47s/it]
Training 2/2 epoch (loss 0.1191):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 562/840 [35:32<20:44,  4.47s/it]
Training 2/2 epoch (loss 0.1191):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 563/840 [35:32<19:39,  4.26s/it]
Training 2/2 epoch (loss 0.1128):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 563/840 [35:35<19:39,  4.26s/it]
Training 2/2 epoch (loss 0.1128):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 564/840 [35:35<18:01,  3.92s/it]
Training 2/2 epoch (loss 0.2773):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 564/840 [35:38<18:01,  3.92s/it]
Training 2/2 epoch (loss 0.2773):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 565/840 [35:38<16:38,  3.63s/it]
Training 2/2 epoch (loss 0.3750):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 565/840 [35:43<16:38,  3.63s/it]
Training 2/2 epoch (loss 0.3750):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 566/840 [35:43<18:33,  4.06s/it]
Training 2/2 epoch (loss 0.1719):  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 566/840 [35:47<18:33,  4.06s/it]
Training 2/2 epoch (loss 0.1719):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 567/840 [35:47<18:02,  3.97s/it]
Training 2/2 epoch (loss 0.2598):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 567/840 [35:51<18:02,  3.97s/it]
Training 2/2 epoch (loss 0.2598):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 568/840 [35:51<17:52,  3.94s/it]
Training 2/2 epoch (loss 0.2188):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 568/840 [35:55<17:52,  3.94s/it]
Training 2/2 epoch (loss 0.2188):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 569/840 [35:55<17:44,  3.93s/it]
Training 2/2 epoch (loss 0.1738):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 569/840 [35:58<17:44,  3.93s/it]
Training 2/2 epoch (loss 0.1738):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 570/840 [35:58<16:33,  3.68s/it]
Training 2/2 epoch (loss 0.1846):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 570/840 [36:02<16:33,  3.68s/it]
Training 2/2 epoch (loss 0.1846):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 571/840 [36:02<16:30,  3.68s/it]
Training 2/2 epoch (loss 0.1992):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 571/840 [36:05<16:30,  3.68s/it]
Training 2/2 epoch (loss 0.1992):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 572/840 [36:05<16:31,  3.70s/it]
Training 2/2 epoch (loss 0.1885):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 572/840 [36:09<16:31,  3.70s/it]
Training 2/2 epoch (loss 0.1885):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 573/840 [36:09<16:28,  3.70s/it]
Training 2/2 epoch (loss 0.1064):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 573/840 [36:12<16:28,  3.70s/it]
Training 2/2 epoch (loss 0.1064):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 574/840 [36:12<15:46,  3.56s/it]
Training 2/2 epoch (loss 0.2471):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 574/840 [36:17<15:46,  3.56s/it]
Training 2/2 epoch (loss 0.2471):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 575/840 [36:17<17:01,  3.85s/it]
Training 2/2 epoch (loss 0.1533):  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 575/840 [36:21<17:01,  3.85s/it]
Training 2/2 epoch (loss 0.1533):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 576/840 [36:21<16:49,  3.82s/it]
Training 2/2 epoch (loss 0.1758):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 576/840 [36:24<16:49,  3.82s/it]
Training 2/2 epoch (loss 0.1758):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 577/840 [36:24<15:55,  3.63s/it]
Training 2/2 epoch (loss 0.2754):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 577/840 [36:27<15:55,  3.63s/it]
Training 2/2 epoch (loss 0.2754):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 578/840 [36:27<14:50,  3.40s/it]
Training 2/2 epoch (loss 0.1699):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 578/840 [36:30<14:50,  3.40s/it]
Training 2/2 epoch (loss 0.1699):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 579/840 [36:30<15:12,  3.50s/it]
Training 2/2 epoch (loss 0.0752):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 579/840 [36:35<15:12,  3.50s/it]
Training 2/2 epoch (loss 0.0752):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 580/840 [36:35<16:09,  3.73s/it]
Training 2/2 epoch (loss 0.3008):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 580/840 [36:39<16:09,  3.73s/it]
Training 2/2 epoch (loss 0.3008):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 581/840 [36:39<17:02,  3.95s/it]
Training 2/2 epoch (loss 0.5117):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 581/840 [36:45<17:02,  3.95s/it]
Training 2/2 epoch (loss 0.5117):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 582/840 [36:45<19:03,  4.43s/it]
Training 2/2 epoch (loss 0.0981):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 582/840 [36:49<19:03,  4.43s/it]
Training 2/2 epoch (loss 0.0981):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 583/840 [36:49<18:40,  4.36s/it]
Training 2/2 epoch (loss 0.1602):  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 583/840 [36:52<18:40,  4.36s/it]
Training 2/2 epoch (loss 0.1602):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 584/840 [36:52<17:02,  3.99s/it]
Training 2/2 epoch (loss 0.1006):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 584/840 [36:56<17:02,  3.99s/it]
Training 2/2 epoch (loss 0.1006):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 585/840 [36:56<16:32,  3.89s/it]
Training 2/2 epoch (loss 0.1758):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 585/840 [36:59<16:32,  3.89s/it]
Training 2/2 epoch (loss 0.1758):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 586/840 [36:59<15:51,  3.75s/it]
Training 2/2 epoch (loss 0.2178):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 586/840 [37:02<15:51,  3.75s/it]
Training 2/2 epoch (loss 0.2178):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 587/840 [37:02<14:57,  3.55s/it]
Training 2/2 epoch (loss 0.1338):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 587/840 [37:06<14:57,  3.55s/it]
Training 2/2 epoch (loss 0.1338):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 588/840 [37:06<14:55,  3.56s/it]
Training 2/2 epoch (loss 0.1318):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 588/840 [37:09<14:55,  3.56s/it]
Training 2/2 epoch (loss 0.1318):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 589/840 [37:09<14:36,  3.49s/it]
Training 2/2 epoch (loss 0.1621):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 589/840 [37:13<14:36,  3.49s/it]
Training 2/2 epoch (loss 0.1621):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 590/840 [37:13<14:39,  3.52s/it]
Training 2/2 epoch (loss 0.0493):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 590/840 [37:17<14:39,  3.52s/it]
Training 2/2 epoch (loss 0.0493):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 591/840 [37:17<15:18,  3.69s/it]
Training 2/2 epoch (loss 0.0947):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 591/840 [37:20<15:18,  3.69s/it]
Training 2/2 epoch (loss 0.0947):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 592/840 [37:20<15:05,  3.65s/it]
Training 2/2 epoch (loss 0.1138):  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 592/840 [37:24<15:05,  3.65s/it]
Training 2/2 epoch (loss 0.1138):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 593/840 [37:24<15:11,  3.69s/it]
Training 2/2 epoch (loss 0.1855):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 593/840 [37:27<15:11,  3.69s/it]
Training 2/2 epoch (loss 0.1855):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 594/840 [37:27<14:00,  3.42s/it]
Training 2/2 epoch (loss 0.0664):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 594/840 [37:30<14:00,  3.42s/it]
Training 2/2 epoch (loss 0.0664):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 595/840 [37:30<13:05,  3.20s/it]
Training 2/2 epoch (loss 0.1289):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 595/840 [37:33<13:05,  3.20s/it]
Training 2/2 epoch (loss 0.1289):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 596/840 [37:33<12:58,  3.19s/it]
Training 2/2 epoch (loss 0.1099):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 596/840 [37:38<12:58,  3.19s/it]
Training 2/2 epoch (loss 0.1099):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 597/840 [37:38<15:41,  3.87s/it]
Training 2/2 epoch (loss 0.1865):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 597/840 [37:42<15:41,  3.87s/it]
Training 2/2 epoch (loss 0.1865):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 598/840 [37:42<15:20,  3.80s/it]
Training 2/2 epoch (loss 0.2715):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 598/840 [37:46<15:20,  3.80s/it]
Training 2/2 epoch (loss 0.2715):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 599/840 [37:46<15:46,  3.93s/it]
Training 2/2 epoch (loss 0.0527):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 599/840 [37:49<15:46,  3.93s/it]
Training 2/2 epoch (loss 0.0527):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 600/840 [37:49<14:06,  3.53s/it]
Training 2/2 epoch (loss 0.0483):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 600/840 [37:52<14:06,  3.53s/it]
Training 2/2 epoch (loss 0.0483):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 601/840 [37:52<13:37,  3.42s/it]
Training 2/2 epoch (loss 0.0713):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 601/840 [37:55<13:37,  3.42s/it]
Training 2/2 epoch (loss 0.0713):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 602/840 [37:55<12:44,  3.21s/it]
Training 2/2 epoch (loss 0.0383):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 602/840 [37:58<12:44,  3.21s/it]
Training 2/2 epoch (loss 0.0383):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 603/840 [37:58<12:31,  3.17s/it]
Training 2/2 epoch (loss 0.0771):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 603/840 [38:01<12:31,  3.17s/it]
Training 2/2 epoch (loss 0.0771):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 604/840 [38:01<12:27,  3.17s/it]
Training 2/2 epoch (loss 0.1963):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 604/840 [38:04<12:27,  3.17s/it]
Training 2/2 epoch (loss 0.1963):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 605/840 [38:04<12:50,  3.28s/it]
Training 2/2 epoch (loss 0.0376):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 605/840 [38:09<12:50,  3.28s/it]
Training 2/2 epoch (loss 0.0376):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 606/840 [38:09<14:00,  3.59s/it]
Training 2/2 epoch (loss 0.3691):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 606/840 [38:12<14:00,  3.59s/it]
Training 2/2 epoch (loss 0.3691):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 607/840 [38:12<13:46,  3.55s/it]
Training 2/2 epoch (loss 0.1816):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 607/840 [38:17<13:46,  3.55s/it]
Training 2/2 epoch (loss 0.1816):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 608/840 [38:17<14:47,  3.82s/it]
Training 2/2 epoch (loss 0.2197):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 608/840 [38:20<14:47,  3.82s/it]
Training 2/2 epoch (loss 0.2197):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 609/840 [38:20<13:50,  3.59s/it]
Training 2/2 epoch (loss 0.1250):  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 609/840 [38:23<13:50,  3.59s/it]
Training 2/2 epoch (loss 0.1250):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 610/840 [38:23<13:24,  3.50s/it]
Training 2/2 epoch (loss 0.0530):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 610/840 [38:27<13:24,  3.50s/it]
Training 2/2 epoch (loss 0.0530):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 611/840 [38:27<13:44,  3.60s/it]
Training 2/2 epoch (loss 0.0854):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 611/840 [38:30<13:44,  3.60s/it]
Training 2/2 epoch (loss 0.0854):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 612/840 [38:30<13:38,  3.59s/it]
Training 2/2 epoch (loss 0.1357):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 612/840 [38:33<13:38,  3.59s/it]
Training 2/2 epoch (loss 0.1357):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 613/840 [38:33<12:58,  3.43s/it]
Training 2/2 epoch (loss 0.1416):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 613/840 [38:37<12:58,  3.43s/it]
Training 2/2 epoch (loss 0.1416):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 614/840 [38:37<13:20,  3.54s/it]
Training 2/2 epoch (loss 0.1016):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 614/840 [38:40<13:20,  3.54s/it]
Training 2/2 epoch (loss 0.1016):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 615/840 [38:40<12:29,  3.33s/it]
Training 2/2 epoch (loss 0.1040):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 615/840 [38:43<12:29,  3.33s/it]
Training 2/2 epoch (loss 0.1040):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 616/840 [38:43<12:27,  3.34s/it]
Training 2/2 epoch (loss 0.4531):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 616/840 [38:49<12:27,  3.34s/it]
Training 2/2 epoch (loss 0.4531):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 617/840 [38:49<14:48,  3.98s/it]
Training 2/2 epoch (loss 0.3594):  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 617/840 [38:53<14:48,  3.98s/it]
Training 2/2 epoch (loss 0.3594):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 618/840 [38:53<14:30,  3.92s/it]
Training 2/2 epoch (loss 0.1992):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 618/840 [38:56<14:30,  3.92s/it]
Training 2/2 epoch (loss 0.1992):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 619/840 [38:56<13:27,  3.66s/it]
Training 2/2 epoch (loss 0.0518):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 619/840 [38:59<13:27,  3.66s/it]
Training 2/2 epoch (loss 0.0518):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 620/840 [38:59<13:09,  3.59s/it]
Training 2/2 epoch (loss 0.0732):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 620/840 [39:03<13:09,  3.59s/it]
Training 2/2 epoch (loss 0.0732):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 621/840 [39:03<13:11,  3.61s/it]
Training 2/2 epoch (loss 0.1299):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 621/840 [39:06<13:11,  3.61s/it]
Training 2/2 epoch (loss 0.1299):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 622/840 [39:06<13:04,  3.60s/it]
Training 2/2 epoch (loss 0.0386):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 622/840 [39:09<13:04,  3.60s/it]
Training 2/2 epoch (loss 0.0386):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 623/840 [39:09<11:55,  3.30s/it]
Training 2/2 epoch (loss 0.2275):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 623/840 [39:13<11:55,  3.30s/it]
Training 2/2 epoch (loss 0.2275):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 624/840 [39:13<12:23,  3.44s/it]
Training 2/2 epoch (loss 0.1367):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 624/840 [39:18<12:23,  3.44s/it]
Training 2/2 epoch (loss 0.1367):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 625/840 [39:18<13:52,  3.87s/it]
Training 2/2 epoch (loss 0.1748):  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 625/840 [39:21<13:52,  3.87s/it]
Training 2/2 epoch (loss 0.1748):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 626/840 [39:21<13:21,  3.75s/it]
Training 2/2 epoch (loss 0.0918):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 626/840 [39:24<13:21,  3.75s/it]
Training 2/2 epoch (loss 0.0918):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 627/840 [39:24<12:44,  3.59s/it]
Training 2/2 epoch (loss 0.0654):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 627/840 [39:29<12:44,  3.59s/it]
Training 2/2 epoch (loss 0.0654):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 628/840 [39:29<13:52,  3.93s/it]
Training 2/2 epoch (loss 0.1006):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 628/840 [39:32<13:52,  3.93s/it]
Training 2/2 epoch (loss 0.1006):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 629/840 [39:32<12:34,  3.58s/it]
Training 2/2 epoch (loss 0.1133):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 629/840 [39:35<12:34,  3.58s/it]
Training 2/2 epoch (loss 0.1133):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 630/840 [39:35<12:10,  3.48s/it]
Training 2/2 epoch (loss 0.0520):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 630/840 [39:41<12:10,  3.48s/it]
Training 2/2 epoch (loss 0.0520):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 631/840 [39:41<14:11,  4.07s/it]
Training 2/2 epoch (loss 0.0732):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 631/840 [39:43<14:11,  4.07s/it]
Training 2/2 epoch (loss 0.0732):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 632/840 [39:43<12:59,  3.75s/it]
Training 2/2 epoch (loss 0.0791):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 632/840 [39:47<12:59,  3.75s/it]
Training 2/2 epoch (loss 0.0791):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 633/840 [39:47<12:15,  3.55s/it]
Training 2/2 epoch (loss 0.0435):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 633/840 [39:50<12:15,  3.55s/it]
Training 2/2 epoch (loss 0.0435):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 634/840 [39:50<12:10,  3.54s/it]
Training 2/2 epoch (loss 0.1533):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 634/840 [39:56<12:10,  3.54s/it]
Training 2/2 epoch (loss 0.1533):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 635/840 [39:56<14:03,  4.11s/it]
Training 2/2 epoch (loss 0.1060):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 635/840 [40:00<14:03,  4.11s/it]
Training 2/2 epoch (loss 0.1060):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 636/840 [40:00<13:58,  4.11s/it]
Training 2/2 epoch (loss 0.1250):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 636/840 [40:04<13:58,  4.11s/it]
Training 2/2 epoch (loss 0.1250):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 637/840 [40:04<13:39,  4.03s/it]
Training 2/2 epoch (loss 0.1250):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 637/840 [40:08<13:39,  4.03s/it]
Training 2/2 epoch (loss 0.1250):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 638/840 [40:08<14:31,  4.32s/it]
Training 2/2 epoch (loss 0.0566):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 638/840 [40:12<14:31,  4.32s/it]
Training 2/2 epoch (loss 0.0566):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 639/840 [40:12<13:46,  4.11s/it]
Training 2/2 epoch (loss 0.0253):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 639/840 [40:16<13:46,  4.11s/it]
Training 2/2 epoch (loss 0.0253):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 640/840 [40:16<13:52,  4.16s/it]
Training 2/2 epoch (loss 0.0942):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 640/840 [40:19<13:52,  4.16s/it]
Training 2/2 epoch (loss 0.0942):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 641/840 [40:19<12:22,  3.73s/it]
Training 2/2 epoch (loss 0.0312):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 641/840 [40:24<12:22,  3.73s/it]
Training 2/2 epoch (loss 0.0312):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 642/840 [40:24<13:28,  4.09s/it]
Training 2/2 epoch (loss 0.1504):  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 642/840 [40:27<13:28,  4.09s/it]
Training 2/2 epoch (loss 0.1504):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 643/840 [40:27<12:38,  3.85s/it]
Training 2/2 epoch (loss 0.1167):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 643/840 [40:30<12:38,  3.85s/it]
Training 2/2 epoch (loss 0.1167):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 644/840 [40:30<11:17,  3.46s/it]
Training 2/2 epoch (loss 0.0601):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 644/840 [40:34<11:17,  3.46s/it]
Training 2/2 epoch (loss 0.0601):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 645/840 [40:34<12:14,  3.77s/it]
Training 2/2 epoch (loss 0.1670):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 645/840 [40:38<12:14,  3.77s/it]
Training 2/2 epoch (loss 0.1670):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 646/840 [40:38<11:59,  3.71s/it]
Training 2/2 epoch (loss 0.2041):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 646/840 [40:41<11:59,  3.71s/it]
Training 2/2 epoch (loss 0.2041):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 647/840 [40:41<11:18,  3.51s/it]
Training 2/2 epoch (loss 0.1074):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 647/840 [40:47<11:18,  3.51s/it]
Training 2/2 epoch (loss 0.1074):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 648/840 [40:47<13:13,  4.13s/it]
Training 2/2 epoch (loss 0.2090):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 648/840 [40:50<13:13,  4.13s/it]
Training 2/2 epoch (loss 0.2090):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 649/840 [40:50<12:23,  3.89s/it]
Training 2/2 epoch (loss 0.1138):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 649/840 [40:54<12:23,  3.89s/it]
Training 2/2 epoch (loss 0.1138):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 650/840 [40:54<12:54,  4.07s/it]
Training 2/2 epoch (loss 0.1758):  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 650/840 [40:57<12:54,  4.07s/it]
Training 2/2 epoch (loss 0.1758):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 651/840 [40:57<11:42,  3.72s/it]
Training 2/2 epoch (loss 0.0535):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 651/840 [41:00<11:42,  3.72s/it]
Training 2/2 epoch (loss 0.0535):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 652/840 [41:00<11:08,  3.56s/it]
Training 2/2 epoch (loss 0.0791):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 652/840 [41:05<11:08,  3.56s/it]
Training 2/2 epoch (loss 0.0791):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 653/840 [41:05<12:03,  3.87s/it]
Training 2/2 epoch (loss 0.0266):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 653/840 [41:08<12:03,  3.87s/it]
Training 2/2 epoch (loss 0.0266):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 654/840 [41:08<11:01,  3.56s/it]
Training 2/2 epoch (loss 0.3711):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 654/840 [41:11<11:01,  3.56s/it]
Training 2/2 epoch (loss 0.3711):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 655/840 [41:11<10:17,  3.34s/it]
Training 2/2 epoch (loss 0.1777):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 655/840 [41:16<10:17,  3.34s/it]
Training 2/2 epoch (loss 0.1777):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 656/840 [41:16<12:18,  4.01s/it]
Training 2/2 epoch (loss 0.0767):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 656/840 [41:21<12:18,  4.01s/it]
Training 2/2 epoch (loss 0.0767):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 657/840 [41:21<12:38,  4.15s/it]
Training 2/2 epoch (loss 0.1006):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 657/840 [41:24<12:38,  4.15s/it]
Training 2/2 epoch (loss 0.1006):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 658/840 [41:24<11:44,  3.87s/it]
Training 2/2 epoch (loss 0.3164):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 658/840 [41:29<11:44,  3.87s/it]
Training 2/2 epoch (loss 0.3164):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 659/840 [41:29<13:04,  4.33s/it]
Training 2/2 epoch (loss 0.1758):  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 659/840 [41:33<13:04,  4.33s/it]
Training 2/2 epoch (loss 0.1758):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 660/840 [41:33<12:30,  4.17s/it]
Training 2/2 epoch (loss 0.1099):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 660/840 [41:38<12:30,  4.17s/it]
Training 2/2 epoch (loss 0.1099):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 661/840 [41:38<12:44,  4.27s/it]
Training 2/2 epoch (loss 0.1011):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 661/840 [41:42<12:44,  4.27s/it]
Training 2/2 epoch (loss 0.1011):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 662/840 [41:42<12:25,  4.19s/it]
Training 2/2 epoch (loss 0.1416):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 662/840 [41:45<12:25,  4.19s/it]
Training 2/2 epoch (loss 0.1416):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 663/840 [41:45<11:42,  3.97s/it]
Training 2/2 epoch (loss 0.0640):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 663/840 [41:51<11:42,  3.97s/it]
Training 2/2 epoch (loss 0.0640):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 664/840 [41:51<12:58,  4.42s/it]
Training 2/2 epoch (loss 0.0928):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 664/840 [41:54<12:58,  4.42s/it]
Training 2/2 epoch (loss 0.0928):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 665/840 [41:54<11:50,  4.06s/it]
Training 2/2 epoch (loss 0.2812):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 665/840 [41:57<11:50,  4.06s/it]
Training 2/2 epoch (loss 0.2812):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 666/840 [41:57<10:58,  3.78s/it]
Training 2/2 epoch (loss 0.0664):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 666/840 [42:00<10:58,  3.78s/it]
Training 2/2 epoch (loss 0.0664):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 667/840 [42:00<09:51,  3.42s/it]
Training 2/2 epoch (loss 0.0898):  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 667/840 [42:05<09:51,  3.42s/it]
Training 2/2 epoch (loss 0.0898):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 668/840 [42:05<11:38,  4.06s/it]
Training 2/2 epoch (loss 0.1152):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 668/840 [42:08<11:38,  4.06s/it]
Training 2/2 epoch (loss 0.1152):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 669/840 [42:08<10:42,  3.76s/it]
Training 2/2 epoch (loss 0.0559):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 669/840 [42:11<10:42,  3.76s/it]
Training 2/2 epoch (loss 0.0559):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 670/840 [42:11<09:58,  3.52s/it]
Training 2/2 epoch (loss 0.1133):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 670/840 [42:15<09:58,  3.52s/it]
Training 2/2 epoch (loss 0.1133):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 671/840 [42:15<10:03,  3.57s/it]
Training 2/2 epoch (loss 0.0549):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 671/840 [42:18<10:03,  3.57s/it]
Training 2/2 epoch (loss 0.0549):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 672/840 [42:18<09:42,  3.47s/it]
Training 2/2 epoch (loss 0.0461):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 672/840 [42:22<09:42,  3.47s/it]
Training 2/2 epoch (loss 0.0461):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 673/840 [42:22<09:47,  3.52s/it]
Training 2/2 epoch (loss 0.1660):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 673/840 [42:26<09:47,  3.52s/it]
Training 2/2 epoch (loss 0.1660):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 674/840 [42:26<10:02,  3.63s/it]
Training 2/2 epoch (loss 0.1143):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 674/840 [42:29<10:02,  3.63s/it]
Training 2/2 epoch (loss 0.1143):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 675/840 [42:29<09:43,  3.54s/it]
Training 2/2 epoch (loss 0.0518):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 675/840 [42:33<09:43,  3.54s/it]
Training 2/2 epoch (loss 0.0518):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 676/840 [42:33<09:55,  3.63s/it]
Training 2/2 epoch (loss 0.0432):  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 676/840 [42:38<09:55,  3.63s/it]
Training 2/2 epoch (loss 0.0432):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 677/840 [42:38<10:48,  3.98s/it]
Training 2/2 epoch (loss 0.0664):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 677/840 [42:41<10:48,  3.98s/it]
Training 2/2 epoch (loss 0.0664):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 678/840 [42:41<10:13,  3.79s/it]
Training 2/2 epoch (loss 0.1553):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 678/840 [42:44<10:13,  3.79s/it]
Training 2/2 epoch (loss 0.1553):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 679/840 [42:44<09:24,  3.51s/it]
Training 2/2 epoch (loss 0.0986):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 679/840 [42:47<09:24,  3.51s/it]
Training 2/2 epoch (loss 0.0986):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 680/840 [42:47<09:30,  3.57s/it]
Training 2/2 epoch (loss 0.0806):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 680/840 [42:50<09:30,  3.57s/it]
Training 2/2 epoch (loss 0.0806):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 681/840 [42:50<08:41,  3.28s/it]
Training 2/2 epoch (loss 0.0354):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 681/840 [42:54<08:41,  3.28s/it]
Training 2/2 epoch (loss 0.0354):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 682/840 [42:54<08:51,  3.36s/it]
Training 2/2 epoch (loss 0.1641):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 682/840 [42:56<08:51,  3.36s/it]
Training 2/2 epoch (loss 0.1641):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 683/840 [42:56<08:14,  3.15s/it]
Training 2/2 epoch (loss 0.0679):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 683/840 [42:59<08:14,  3.15s/it]
Training 2/2 epoch (loss 0.0679):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 684/840 [42:59<07:43,  2.97s/it]
Training 2/2 epoch (loss 0.2910):  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 684/840 [43:02<07:43,  2.97s/it]
Training 2/2 epoch (loss 0.2910):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 685/840 [43:02<07:36,  2.95s/it]
Training 2/2 epoch (loss 0.3105):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 685/840 [43:07<07:36,  2.95s/it]
Training 2/2 epoch (loss 0.3105):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 686/840 [43:07<09:31,  3.71s/it]
Training 2/2 epoch (loss 0.2695):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 686/840 [43:10<09:31,  3.71s/it]
Training 2/2 epoch (loss 0.2695):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 687/840 [43:10<08:55,  3.50s/it]
Training 2/2 epoch (loss 0.0962):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 687/840 [43:13<08:55,  3.50s/it]
Training 2/2 epoch (loss 0.0962):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 688/840 [43:13<08:39,  3.42s/it]
Training 2/2 epoch (loss 0.0732):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 688/840 [43:16<08:39,  3.42s/it]
Training 2/2 epoch (loss 0.0732):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 689/840 [43:16<07:56,  3.15s/it]
Training 2/2 epoch (loss 0.0830):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 689/840 [43:19<07:56,  3.15s/it]
Training 2/2 epoch (loss 0.0830):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 690/840 [43:19<08:01,  3.21s/it]
Training 2/2 epoch (loss 0.0413):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 690/840 [43:24<08:01,  3.21s/it]
Training 2/2 epoch (loss 0.0413):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 691/840 [43:24<09:10,  3.69s/it]
Training 2/2 epoch (loss 0.0435):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 691/840 [43:27<09:10,  3.69s/it]
Training 2/2 epoch (loss 0.0435):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 692/840 [43:27<08:36,  3.49s/it]
Training 2/2 epoch (loss 0.1216):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 692/840 [43:31<08:36,  3.49s/it]
Training 2/2 epoch (loss 0.1216):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 693/840 [43:31<09:05,  3.71s/it]
Training 2/2 epoch (loss 0.0742):  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 693/840 [43:35<09:05,  3.71s/it]
Training 2/2 epoch (loss 0.0742):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 694/840 [43:35<09:19,  3.83s/it]
Training 2/2 epoch (loss 0.0598):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 694/840 [43:40<09:19,  3.83s/it]
Training 2/2 epoch (loss 0.0598):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 695/840 [43:40<09:57,  4.12s/it]
Training 2/2 epoch (loss 0.0598):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 695/840 [43:45<09:57,  4.12s/it]
Training 2/2 epoch (loss 0.0598):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 696/840 [43:45<09:58,  4.16s/it]
Training 2/2 epoch (loss 0.1553):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 696/840 [43:48<09:58,  4.16s/it]
Training 2/2 epoch (loss 0.1553):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 697/840 [43:48<09:15,  3.89s/it]
Training 2/2 epoch (loss 0.0801):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 697/840 [43:52<09:15,  3.89s/it]
Training 2/2 epoch (loss 0.0801):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 698/840 [43:52<09:12,  3.89s/it]
Training 2/2 epoch (loss 0.0645):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 698/840 [43:55<09:12,  3.89s/it]
Training 2/2 epoch (loss 0.0645):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 699/840 [43:55<08:57,  3.81s/it]
Training 2/2 epoch (loss 0.1133):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 699/840 [43:58<08:57,  3.81s/it]
Training 2/2 epoch (loss 0.1133):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 700/840 [43:58<08:13,  3.52s/it]
Training 2/2 epoch (loss 0.1035):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 700/840 [44:02<08:13,  3.52s/it]
Training 2/2 epoch (loss 0.1035):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 701/840 [44:02<08:07,  3.51s/it]
Training 2/2 epoch (loss 0.0693):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 701/840 [44:05<08:07,  3.51s/it]
Training 2/2 epoch (loss 0.0693):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 702/840 [44:05<07:47,  3.39s/it]
Training 2/2 epoch (loss 0.0552):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 702/840 [44:09<07:47,  3.39s/it]
Training 2/2 epoch (loss 0.0552):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 703/840 [44:09<08:08,  3.56s/it]
Training 2/2 epoch (loss 0.1104):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 703/840 [44:12<08:08,  3.56s/it]
Training 2/2 epoch (loss 0.1104):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 704/840 [44:12<07:34,  3.34s/it]
Training 2/2 epoch (loss 0.0347):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 704/840 [44:15<07:34,  3.34s/it]
Training 2/2 epoch (loss 0.0347):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 705/840 [44:15<07:47,  3.46s/it]
Training 2/2 epoch (loss 0.0396):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 705/840 [44:19<07:47,  3.46s/it]
Training 2/2 epoch (loss 0.0396):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 706/840 [44:19<07:43,  3.46s/it]
Training 2/2 epoch (loss 0.1357):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 706/840 [44:22<07:43,  3.46s/it]
Training 2/2 epoch (loss 0.1357):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 707/840 [44:22<07:44,  3.49s/it]
Training 2/2 epoch (loss 0.0503):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 707/840 [44:25<07:44,  3.49s/it]
Training 2/2 epoch (loss 0.0503):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 708/840 [44:25<07:14,  3.29s/it]
Training 2/2 epoch (loss 0.0786):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 708/840 [44:28<07:14,  3.29s/it]
Training 2/2 epoch (loss 0.0786):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 709/840 [44:28<07:02,  3.23s/it]
Training 2/2 epoch (loss 0.1523):  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 709/840 [44:32<07:02,  3.23s/it]
Training 2/2 epoch (loss 0.1523):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 710/840 [44:32<07:14,  3.35s/it]
Training 2/2 epoch (loss 0.0143):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 710/840 [44:36<07:14,  3.35s/it]
Training 2/2 epoch (loss 0.0143):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 711/840 [44:36<07:47,  3.62s/it]
Training 2/2 epoch (loss 0.0266):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 711/840 [44:41<07:47,  3.62s/it]
Training 2/2 epoch (loss 0.0266):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 712/840 [44:41<08:32,  4.01s/it]
Training 2/2 epoch (loss 0.2812):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 712/840 [44:45<08:32,  4.01s/it]
Training 2/2 epoch (loss 0.2812):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 713/840 [44:45<08:41,  4.11s/it]
Training 2/2 epoch (loss 0.0557):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 713/840 [44:48<08:41,  4.11s/it]
Training 2/2 epoch (loss 0.0557):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 714/840 [44:48<07:58,  3.80s/it]
Training 2/2 epoch (loss 0.0742):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 714/840 [44:51<07:58,  3.80s/it]
Training 2/2 epoch (loss 0.0742):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 715/840 [44:51<07:15,  3.48s/it]
Training 2/2 epoch (loss 0.0457):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 715/840 [44:55<07:15,  3.48s/it]
Training 2/2 epoch (loss 0.0457):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 716/840 [44:55<07:12,  3.49s/it]
Training 2/2 epoch (loss 0.1079):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 716/840 [44:58<07:12,  3.49s/it]
Training 2/2 epoch (loss 0.1079):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 717/840 [44:58<07:15,  3.54s/it]
Training 2/2 epoch (loss 0.0471):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 717/840 [45:02<07:15,  3.54s/it]
Training 2/2 epoch (loss 0.0471):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 718/840 [45:02<07:20,  3.61s/it]
Training 2/2 epoch (loss 0.1064):  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 718/840 [45:06<07:20,  3.61s/it]
Training 2/2 epoch (loss 0.1064):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 719/840 [45:06<07:11,  3.57s/it]
Training 2/2 epoch (loss 0.0182):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 719/840 [45:09<07:11,  3.57s/it]
Training 2/2 epoch (loss 0.0182):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 720/840 [45:09<06:46,  3.39s/it]
Training 2/2 epoch (loss 0.2119):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 720/840 [45:13<06:46,  3.39s/it]
Training 2/2 epoch (loss 0.2119):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 721/840 [45:13<07:33,  3.81s/it]
Training 2/2 epoch (loss 0.1689):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 721/840 [45:17<07:33,  3.81s/it]
Training 2/2 epoch (loss 0.1689):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 722/840 [45:17<07:30,  3.81s/it]
Training 2/2 epoch (loss 0.0479):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 722/840 [45:23<07:30,  3.81s/it]
Training 2/2 epoch (loss 0.0479):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 723/840 [45:23<08:25,  4.32s/it]
Training 2/2 epoch (loss 0.0776):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 723/840 [45:26<08:25,  4.32s/it]
Training 2/2 epoch (loss 0.0776):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 724/840 [45:26<07:45,  4.01s/it]
Training 2/2 epoch (loss 0.0762):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 724/840 [45:30<07:45,  4.01s/it]
Training 2/2 epoch (loss 0.0762):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 725/840 [45:30<07:28,  3.90s/it]
Training 2/2 epoch (loss 0.1045):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 725/840 [45:35<07:28,  3.90s/it]
Training 2/2 epoch (loss 0.1045):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 726/840 [45:35<08:00,  4.21s/it]
Training 2/2 epoch (loss 0.0476):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 726/840 [45:38<08:00,  4.21s/it]
Training 2/2 epoch (loss 0.0476):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 727/840 [45:38<07:34,  4.02s/it]
Training 2/2 epoch (loss 0.0923):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 727/840 [45:41<07:34,  4.02s/it]
Training 2/2 epoch (loss 0.0923):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 728/840 [45:41<07:04,  3.79s/it]
Training 2/2 epoch (loss 0.0688):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 728/840 [45:45<07:04,  3.79s/it]
Training 2/2 epoch (loss 0.0688):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 729/840 [45:45<06:56,  3.75s/it]
Training 2/2 epoch (loss 0.0449):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 729/840 [45:48<06:56,  3.75s/it]
Training 2/2 epoch (loss 0.0449):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 730/840 [45:48<06:34,  3.58s/it]
Training 2/2 epoch (loss 0.0300):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 730/840 [45:52<06:34,  3.58s/it]
Training 2/2 epoch (loss 0.0300):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 731/840 [45:52<06:31,  3.59s/it]
Training 2/2 epoch (loss 0.1279):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 731/840 [45:55<06:31,  3.59s/it]
Training 2/2 epoch (loss 0.1279):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 732/840 [45:55<06:19,  3.52s/it]
Training 2/2 epoch (loss 0.0718):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 732/840 [45:58<06:19,  3.52s/it]
Training 2/2 epoch (loss 0.0718):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 733/840 [45:58<05:42,  3.20s/it]
Training 2/2 epoch (loss 0.0693):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 733/840 [46:02<05:42,  3.20s/it]
Training 2/2 epoch (loss 0.0693):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 734/840 [46:02<06:08,  3.48s/it]
Training 2/2 epoch (loss 0.0271):  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 734/840 [46:06<06:08,  3.48s/it]
Training 2/2 epoch (loss 0.0271):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 735/840 [46:06<06:32,  3.74s/it]
Training 2/2 epoch (loss 0.0532):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 735/840 [46:10<06:32,  3.74s/it]
Training 2/2 epoch (loss 0.0532):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 736/840 [46:10<06:24,  3.70s/it]
Training 2/2 epoch (loss 0.0476):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 736/840 [46:14<06:24,  3.70s/it]
Training 2/2 epoch (loss 0.0476):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 737/840 [46:14<06:39,  3.88s/it]
Training 2/2 epoch (loss 0.1914):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 737/840 [46:17<06:39,  3.88s/it]
Training 2/2 epoch (loss 0.1914):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 738/840 [46:17<06:12,  3.65s/it]
Training 2/2 epoch (loss 0.1006):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 738/840 [46:23<06:12,  3.65s/it]
Training 2/2 epoch (loss 0.1006):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 739/840 [46:23<07:02,  4.18s/it]
Training 2/2 epoch (loss 0.0239):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 739/840 [46:25<07:02,  4.18s/it]
Training 2/2 epoch (loss 0.0239):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 740/840 [46:25<06:20,  3.81s/it]
Training 2/2 epoch (loss 0.0498):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 740/840 [46:31<06:20,  3.81s/it]
Training 2/2 epoch (loss 0.0498):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 741/840 [46:31<07:02,  4.26s/it]
Training 2/2 epoch (loss 0.1182):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 741/840 [46:35<07:02,  4.26s/it]
Training 2/2 epoch (loss 0.1182):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 742/840 [46:35<07:03,  4.32s/it]
Training 2/2 epoch (loss 0.0859):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 742/840 [46:40<07:03,  4.32s/it]
Training 2/2 epoch (loss 0.0859):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 743/840 [46:40<07:06,  4.40s/it]
Training 2/2 epoch (loss 0.0952):  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 743/840 [46:43<07:06,  4.40s/it]
Training 2/2 epoch (loss 0.0952):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 744/840 [46:43<06:36,  4.13s/it]
Training 2/2 epoch (loss 0.0583):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 744/840 [46:46<06:36,  4.13s/it]
Training 2/2 epoch (loss 0.0583):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 745/840 [46:46<05:57,  3.76s/it]
Training 2/2 epoch (loss 0.2227):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 745/840 [46:51<05:57,  3.76s/it]
Training 2/2 epoch (loss 0.2227):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 746/840 [46:51<06:20,  4.05s/it]
Training 2/2 epoch (loss 0.0718):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 746/840 [46:56<06:20,  4.05s/it]
Training 2/2 epoch (loss 0.0718):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 747/840 [46:56<06:54,  4.46s/it]
Training 2/2 epoch (loss 0.0410):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 747/840 [47:00<06:54,  4.46s/it]
Training 2/2 epoch (loss 0.0410):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 748/840 [47:00<06:31,  4.26s/it]
Training 2/2 epoch (loss 0.0718):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 748/840 [47:04<06:31,  4.26s/it]
Training 2/2 epoch (loss 0.0718):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 749/840 [47:04<06:02,  3.99s/it]
Training 2/2 epoch (loss 0.0325):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 749/840 [47:07<06:02,  3.99s/it]
Training 2/2 epoch (loss 0.0325):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 750/840 [47:07<05:35,  3.73s/it]
Training 2/2 epoch (loss 0.0166):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 750/840 [47:10<05:35,  3.73s/it]
Training 2/2 epoch (loss 0.0166):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 751/840 [47:10<05:16,  3.56s/it]
Training 2/2 epoch (loss 0.0781):  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 751/840 [47:12<05:16,  3.56s/it]
Training 2/2 epoch (loss 0.0781):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 752/840 [47:12<04:48,  3.27s/it]
Training 2/2 epoch (loss 0.0938):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 752/840 [47:18<04:48,  3.27s/it]
Training 2/2 epoch (loss 0.0938):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 753/840 [47:18<05:41,  3.93s/it]
Training 2/2 epoch (loss 0.1172):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 753/840 [47:23<05:41,  3.93s/it]
Training 2/2 epoch (loss 0.1172):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 754/840 [47:23<06:20,  4.42s/it]
Training 2/2 epoch (loss 0.0337):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 754/840 [47:29<06:20,  4.42s/it]
Training 2/2 epoch (loss 0.0337):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 755/840 [47:29<06:43,  4.74s/it]
Training 2/2 epoch (loss 0.0327):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 755/840 [47:33<06:43,  4.74s/it]
Training 2/2 epoch (loss 0.0327):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 756/840 [47:33<06:08,  4.39s/it]
Training 2/2 epoch (loss 0.0908):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 756/840 [47:36<06:08,  4.39s/it]
Training 2/2 epoch (loss 0.0908):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 757/840 [47:36<05:47,  4.19s/it]
Training 2/2 epoch (loss 0.1011):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 757/840 [47:39<05:47,  4.19s/it]
Training 2/2 epoch (loss 0.1011):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 758/840 [47:39<05:17,  3.87s/it]
Training 2/2 epoch (loss 0.0708):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 758/840 [47:43<05:17,  3.87s/it]
Training 2/2 epoch (loss 0.0708):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 759/840 [47:43<05:00,  3.71s/it]
Training 2/2 epoch (loss 0.0574):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 759/840 [47:47<05:00,  3.71s/it]
Training 2/2 epoch (loss 0.0574):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 760/840 [47:47<05:04,  3.80s/it]
Training 2/2 epoch (loss 0.0211):  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 760/840 [47:50<05:04,  3.80s/it]
Training 2/2 epoch (loss 0.0211):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 761/840 [47:50<04:44,  3.61s/it]
Training 2/2 epoch (loss 0.0322):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 761/840 [47:54<04:44,  3.61s/it]
Training 2/2 epoch (loss 0.0322):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 762/840 [47:54<04:46,  3.67s/it]
Training 2/2 epoch (loss 0.1357):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 762/840 [47:57<04:46,  3.67s/it]
Training 2/2 epoch (loss 0.1357):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 763/840 [47:57<04:27,  3.47s/it]
Training 2/2 epoch (loss 0.1289):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 763/840 [48:01<04:27,  3.47s/it]
Training 2/2 epoch (loss 0.1289):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 764/840 [48:01<04:48,  3.79s/it]
Training 2/2 epoch (loss 0.1123):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 764/840 [48:04<04:48,  3.79s/it]
Training 2/2 epoch (loss 0.1123):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 765/840 [48:04<04:18,  3.44s/it]
Training 2/2 epoch (loss 0.1367):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 765/840 [48:07<04:18,  3.44s/it]
Training 2/2 epoch (loss 0.1367):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 766/840 [48:07<03:56,  3.20s/it]
Training 2/2 epoch (loss 0.0620):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 766/840 [48:09<03:56,  3.20s/it]
Training 2/2 epoch (loss 0.0620):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 767/840 [48:09<03:41,  3.04s/it]
Training 2/2 epoch (loss 0.0105):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 767/840 [48:12<03:41,  3.04s/it]
Training 2/2 epoch (loss 0.0105):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 768/840 [48:12<03:43,  3.11s/it]
Training 2/2 epoch (loss 0.0264):  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 768/840 [48:16<03:43,  3.11s/it]
Training 2/2 epoch (loss 0.0264):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 769/840 [48:16<03:59,  3.38s/it]
Training 2/2 epoch (loss 0.0366):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 769/840 [48:19<03:59,  3.38s/it]
Training 2/2 epoch (loss 0.0366):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 770/840 [48:19<03:43,  3.19s/it]
Training 2/2 epoch (loss 0.0591):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 770/840 [48:25<03:43,  3.19s/it]
Training 2/2 epoch (loss 0.0591):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 771/840 [48:25<04:26,  3.86s/it]
Training 2/2 epoch (loss 0.1289):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 771/840 [48:28<04:26,  3.86s/it]
Training 2/2 epoch (loss 0.1289):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 772/840 [48:28<04:06,  3.62s/it]
Training 2/2 epoch (loss 0.1240):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 772/840 [48:31<04:06,  3.62s/it]
Training 2/2 epoch (loss 0.1240):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 773/840 [48:31<04:02,  3.62s/it]
Training 2/2 epoch (loss 0.0559):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 773/840 [48:35<04:02,  3.62s/it]
Training 2/2 epoch (loss 0.0559):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 774/840 [48:35<03:57,  3.60s/it]
Training 2/2 epoch (loss 0.0889):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 774/840 [48:38<03:57,  3.60s/it]
Training 2/2 epoch (loss 0.0889):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 775/840 [48:38<03:45,  3.47s/it]
Training 2/2 epoch (loss 0.0742):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 775/840 [48:42<03:45,  3.47s/it]
Training 2/2 epoch (loss 0.0742):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 776/840 [48:42<03:52,  3.63s/it]
Training 2/2 epoch (loss 0.0157):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 776/840 [48:47<03:52,  3.63s/it]
Training 2/2 epoch (loss 0.0157):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 777/840 [48:47<04:12,  4.00s/it]
Training 2/2 epoch (loss 0.1455):  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 777/840 [48:51<04:12,  4.00s/it]
Training 2/2 epoch (loss 0.1455):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 778/840 [48:51<04:18,  4.17s/it]
Training 2/2 epoch (loss 0.0698):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 778/840 [48:57<04:18,  4.17s/it]
Training 2/2 epoch (loss 0.0698):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 779/840 [48:57<04:37,  4.55s/it]
Training 2/2 epoch (loss 0.0591):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 779/840 [49:01<04:37,  4.55s/it]
Training 2/2 epoch (loss 0.0591):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 780/840 [49:01<04:24,  4.40s/it]
Training 2/2 epoch (loss 0.0077):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 780/840 [49:05<04:24,  4.40s/it]
Training 2/2 epoch (loss 0.0077):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 781/840 [49:05<04:12,  4.28s/it]
Training 2/2 epoch (loss 0.1226):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 781/840 [49:08<04:12,  4.28s/it]
Training 2/2 epoch (loss 0.1226):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 782/840 [49:08<03:43,  3.85s/it]
Training 2/2 epoch (loss 0.0391):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 782/840 [49:11<03:43,  3.85s/it]
Training 2/2 epoch (loss 0.0391):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 783/840 [49:11<03:33,  3.75s/it]
Training 2/2 epoch (loss 0.0635):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 783/840 [49:16<03:33,  3.75s/it]
Training 2/2 epoch (loss 0.0635):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 784/840 [49:16<03:52,  4.15s/it]
Training 2/2 epoch (loss 0.1543):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 784/840 [49:20<03:52,  4.15s/it]
Training 2/2 epoch (loss 0.1543):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 785/840 [49:20<03:40,  4.00s/it]
Training 2/2 epoch (loss 0.0698):  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 785/840 [49:26<03:40,  4.00s/it]
Training 2/2 epoch (loss 0.0698):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 786/840 [49:26<04:00,  4.45s/it]
Training 2/2 epoch (loss 0.0493):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 786/840 [49:29<04:00,  4.45s/it]
Training 2/2 epoch (loss 0.0493):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 787/840 [49:29<03:41,  4.17s/it]
Training 2/2 epoch (loss 0.0518):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 787/840 [49:34<03:41,  4.17s/it]
Training 2/2 epoch (loss 0.0518):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 788/840 [49:34<03:43,  4.30s/it]
Training 2/2 epoch (loss 0.2695):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 788/840 [49:37<03:43,  4.30s/it]
Training 2/2 epoch (loss 0.2695):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 789/840 [49:37<03:21,  3.96s/it]
Training 2/2 epoch (loss 0.4727):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 789/840 [49:42<03:21,  3.96s/it]
Training 2/2 epoch (loss 0.4727):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 790/840 [49:42<03:41,  4.43s/it]
Training 2/2 epoch (loss 0.0918):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 790/840 [49:48<03:41,  4.43s/it]
Training 2/2 epoch (loss 0.0918):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 791/840 [49:48<03:52,  4.74s/it]
Training 2/2 epoch (loss 0.0255):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 791/840 [49:53<03:52,  4.74s/it]
Training 2/2 epoch (loss 0.0255):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 792/840 [49:53<03:58,  4.97s/it]
Training 2/2 epoch (loss 0.0214):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 792/840 [49:57<03:58,  4.97s/it]
Training 2/2 epoch (loss 0.0214):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 793/840 [49:57<03:31,  4.50s/it]
Training 2/2 epoch (loss 0.0361):  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 793/840 [50:00<03:31,  4.50s/it]
Training 2/2 epoch (loss 0.0361):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 794/840 [50:00<03:13,  4.22s/it]
Training 2/2 epoch (loss 0.0684):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 794/840 [50:05<03:13,  4.22s/it]
Training 2/2 epoch (loss 0.0684):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 795/840 [50:05<03:14,  4.31s/it]
Training 2/2 epoch (loss 0.0306):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 795/840 [50:10<03:14,  4.31s/it]
Training 2/2 epoch (loss 0.0306):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 796/840 [50:10<03:17,  4.49s/it]
Training 2/2 epoch (loss 0.0079):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 796/840 [50:14<03:17,  4.49s/it]
Training 2/2 epoch (loss 0.0079):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 797/840 [50:14<03:06,  4.33s/it]
Training 2/2 epoch (loss 0.0544):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 797/840 [50:17<03:06,  4.33s/it]
Training 2/2 epoch (loss 0.0544):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 798/840 [50:17<02:54,  4.16s/it]
Training 2/2 epoch (loss 0.0205):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 798/840 [50:20<02:54,  4.16s/it]
Training 2/2 epoch (loss 0.0205):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 799/840 [50:20<02:32,  3.72s/it]
Training 2/2 epoch (loss 0.0732):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 799/840 [50:25<02:32,  3.72s/it]
Training 2/2 epoch (loss 0.0732):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 800/840 [50:25<02:42,  4.05s/it]
Training 2/2 epoch (loss 0.0366):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 800/840 [50:28<02:42,  4.05s/it]
Training 2/2 epoch (loss 0.0366):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 801/840 [50:28<02:23,  3.67s/it]
Training 2/2 epoch (loss 0.0310):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 801/840 [50:33<02:23,  3.67s/it]
Training 2/2 epoch (loss 0.0310):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 802/840 [50:33<02:33,  4.05s/it]
Training 2/2 epoch (loss 0.0332):  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 802/840 [50:36<02:33,  4.05s/it]
Training 2/2 epoch (loss 0.0332):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 803/840 [50:36<02:20,  3.79s/it]
Training 2/2 epoch (loss 0.2227):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 803/840 [50:39<02:20,  3.79s/it]
Training 2/2 epoch (loss 0.2227):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 804/840 [50:39<02:14,  3.73s/it]
Training 2/2 epoch (loss 0.0654):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 804/840 [50:43<02:14,  3.73s/it]
Training 2/2 epoch (loss 0.0654):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 805/840 [50:43<02:03,  3.53s/it]
Training 2/2 epoch (loss 0.0537):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 805/840 [50:46<02:03,  3.53s/it]
Training 2/2 epoch (loss 0.0537):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 806/840 [50:46<01:54,  3.37s/it]
Training 2/2 epoch (loss 0.0986):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 806/840 [50:48<01:54,  3.37s/it]
Training 2/2 epoch (loss 0.0986):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 807/840 [50:48<01:45,  3.20s/it]
Training 2/2 epoch (loss 0.1133):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 807/840 [50:53<01:45,  3.20s/it]
Training 2/2 epoch (loss 0.1133):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 808/840 [50:53<01:52,  3.52s/it]
Training 2/2 epoch (loss 0.0403):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 808/840 [50:55<01:52,  3.52s/it]
Training 2/2 epoch (loss 0.0403):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 809/840 [50:55<01:42,  3.29s/it]
Training 2/2 epoch (loss 0.0062):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 809/840 [50:58<01:42,  3.29s/it]
Training 2/2 epoch (loss 0.0062):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 810/840 [50:58<01:36,  3.21s/it]
Training 2/2 epoch (loss 0.0194):  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 810/840 [51:02<01:36,  3.21s/it]
Training 2/2 epoch (loss 0.0194):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 811/840 [51:02<01:32,  3.19s/it]
Training 2/2 epoch (loss 0.0359):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 811/840 [51:04<01:32,  3.19s/it]
Training 2/2 epoch (loss 0.0359):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 812/840 [51:04<01:22,  2.96s/it]
Training 2/2 epoch (loss 0.0425):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 812/840 [51:07<01:22,  2.96s/it]
Training 2/2 epoch (loss 0.0425):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 813/840 [51:07<01:24,  3.12s/it]
Training 2/2 epoch (loss 0.0284):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 813/840 [51:11<01:24,  3.12s/it]
Training 2/2 epoch (loss 0.0284):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 814/840 [51:11<01:21,  3.12s/it]
Training 2/2 epoch (loss 0.0483):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 814/840 [51:13<01:21,  3.12s/it]
Training 2/2 epoch (loss 0.0483):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 815/840 [51:13<01:13,  2.96s/it]
Training 2/2 epoch (loss 0.0396):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 815/840 [51:17<01:13,  2.96s/it]
Training 2/2 epoch (loss 0.0396):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 816/840 [51:17<01:13,  3.08s/it]
Training 2/2 epoch (loss 0.0383):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 816/840 [51:20<01:13,  3.08s/it]
Training 2/2 epoch (loss 0.0383):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 817/840 [51:20<01:16,  3.33s/it]
Training 2/2 epoch (loss 0.0718):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 817/840 [51:24<01:16,  3.33s/it]
Training 2/2 epoch (loss 0.0718):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 818/840 [51:24<01:13,  3.35s/it]
Training 2/2 epoch (loss 0.1543):  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 818/840 [51:29<01:13,  3.35s/it]
Training 2/2 epoch (loss 0.1543):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 819/840 [51:29<01:23,  3.98s/it]
Training 2/2 epoch (loss 0.0223):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 819/840 [51:35<01:23,  3.98s/it]
Training 2/2 epoch (loss 0.0223):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 820/840 [51:35<01:29,  4.47s/it]
Training 2/2 epoch (loss 0.0791):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 820/840 [51:39<01:29,  4.47s/it]
Training 2/2 epoch (loss 0.0791):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 821/840 [51:39<01:22,  4.32s/it]
Training 2/2 epoch (loss 0.0574):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 821/840 [51:44<01:22,  4.32s/it]
Training 2/2 epoch (loss 0.0574):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 822/840 [51:44<01:24,  4.68s/it]
Training 2/2 epoch (loss 0.0101):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 822/840 [51:50<01:24,  4.68s/it]
Training 2/2 epoch (loss 0.0101):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 823/840 [51:50<01:23,  4.90s/it]
Training 2/2 epoch (loss 0.0542):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 823/840 [51:53<01:23,  4.90s/it]
Training 2/2 epoch (loss 0.0542):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 824/840 [51:53<01:09,  4.34s/it]
Training 2/2 epoch (loss 0.0564):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 824/840 [51:55<01:09,  4.34s/it]
Training 2/2 epoch (loss 0.0564):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 825/840 [51:55<00:56,  3.78s/it]
Training 2/2 epoch (loss 0.1426):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 825/840 [52:00<00:56,  3.78s/it]
Training 2/2 epoch (loss 0.1426):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 826/840 [52:00<00:56,  4.03s/it]
Training 2/2 epoch (loss 0.0206):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 826/840 [52:05<00:56,  4.03s/it]
Training 2/2 epoch (loss 0.0206):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 827/840 [52:05<00:54,  4.22s/it]
Training 2/2 epoch (loss 0.1108):  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 827/840 [52:09<00:54,  4.22s/it]
Training 2/2 epoch (loss 0.1108):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 828/840 [52:09<00:52,  4.39s/it]
Training 2/2 epoch (loss 0.0576):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 828/840 [52:12<00:52,  4.39s/it]
Training 2/2 epoch (loss 0.0576):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 829/840 [52:12<00:42,  3.86s/it]
Training 2/2 epoch (loss 0.0596):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 829/840 [52:18<00:42,  3.86s/it]
Training 2/2 epoch (loss 0.0596):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 830/840 [52:18<00:43,  4.37s/it]
Training 2/2 epoch (loss 0.0610):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 830/840 [52:20<00:43,  4.37s/it]
Training 2/2 epoch (loss 0.0610):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 831/840 [52:20<00:35,  3.93s/it]
Training 2/2 epoch (loss 0.0079):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 831/840 [52:24<00:35,  3.93s/it]
Training 2/2 epoch (loss 0.0079):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 832/840 [52:24<00:29,  3.73s/it]
Training 2/2 epoch (loss 0.0393):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 832/840 [52:27<00:29,  3.73s/it]
Training 2/2 epoch (loss 0.0393):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 833/840 [52:27<00:24,  3.53s/it]
Training 2/2 epoch (loss 0.0503):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 833/840 [52:31<00:24,  3.53s/it]
Training 2/2 epoch (loss 0.0503):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 834/840 [52:31<00:22,  3.78s/it]
Training 2/2 epoch (loss 0.0415):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 834/840 [52:35<00:22,  3.78s/it]
Training 2/2 epoch (loss 0.0415):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 835/840 [52:35<00:19,  3.91s/it]
Training 2/2 epoch (loss 0.0461):  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 835/840 [52:40<00:19,  3.91s/it]
Training 2/2 epoch (loss 0.0461): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 836/840 [52:40<00:16,  4.15s/it]
Training 2/2 epoch (loss 0.0581): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 836/840 [52:43<00:16,  4.15s/it]
Training 2/2 epoch (loss 0.0581): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 837/840 [52:43<00:11,  3.90s/it]
Training 2/2 epoch (loss 0.0503): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 837/840 [52:47<00:11,  3.90s/it]
Training 2/2 epoch (loss 0.0503): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 838/840 [52:47<00:07,  3.86s/it]
Training 2/2 epoch (loss 0.1348): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 838/840 [52:50<00:07,  3.86s/it]
Training 2/2 epoch (loss 0.1348): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 839/840 [52:50<00:03,  3.63s/it]
Training 2/2 epoch (loss 0.0248): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 839/840 [52:54<00:03,  3.63s/it]
Training 2/2 epoch (loss 0.0248): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 840/840 [52:54<00:00,  3.64s/it]
Training 2/2 epoch (loss 0.0248): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 840/840 [52:54<00:00,  3.78s/it]
/data/jiongxiao_wang/anaconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/data/jiongxiao_wang/anaconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/data/jiongxiao_wang/anaconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/data/jiongxiao_wang/anaconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
wandb: - 0.015 MB of 0.015 MB uploaded
wandb: \ 0.015 MB of 0.033 MB uploaded
wandb: | 0.015 MB of 0.033 MB uploaded
wandb: / 0.033 MB of 0.033 MB uploaded
wandb:                                                                                
wandb: 
wandb: Run history:
wandb: train/accuracy β–‚β–β–„β–…β–ƒβ–ƒβ–„β–…β–…β–…β–…β–…β–„β–…β–„β–…β–„β–„β–…β–…β–…β–†β–…β–‡β–‡β–†β–‡β–‡β–ˆβ–‡β–‡β–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
wandb:    train/epoch β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb:     train/loss β–ˆβ–ˆβ–‡β–‡β–ˆβ–‡β–ˆβ–‡β–‡β–‡β–‡β–†β–†β–†β–‡β–…β–†β–†β–†β–†β–†β–„β–…β–ƒβ–‚β–ƒβ–ƒβ–‚β–β–ƒβ–‚β–β–‚β–β–‚β–‚β–‚β–β–β–
wandb:       train/lr β–ƒβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–‡β–†β–†β–†β–†β–…β–…β–…β–…β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–
wandb:     train/step β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb: 
wandb: Run summary:
wandb: train/accuracy 0.98333
wandb:    train/epoch 2.0
wandb:     train/loss 0.02478
wandb:       train/lr 0.0
wandb:     train/step 840
wandb: 
wandb: πŸš€ View run reward-2024-01-05-20-03-25 at: https://wandb.ai/jayfeather1024/Safe-RLHF-RM/runs/0bh9htd8
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./output/rm_30k/wandb/run-20240105_200327-0bh9htd8/logs