rm_30k / stderr.log
Jayfeather1024's picture
update
f8ce820
raw
history blame contribute delete
No virus
172 kB
+ deepspeed --num_nodes=1 --num_gpus=4 --master_port 47607 --module safe_rlhf.values.reward --train_datasets PKU-SafeRLHF/train:1.0:PKU-SafeRLHF-harmless-only-30k --eval_datasets PKU-SafeRLHF/test --model_name_or_path output/sft --max_length 512 --trust_remote_code True --loss_type sequence-wise --epochs 2 --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --gradient_accumulation_steps 2 --gradient_checkpointing --normalize_score_during_training False --normalizer_type ExponentialMovingAverage --normalizer_momentum 0.9 --learning_rate 2e-5 --lr_scheduler_type cosine --lr_warmup_ratio 0.03 --weight_decay 0.1 --seed 42 --eval_strategy epoch --output_dir /data/jiongxiao_wang/rlhf_attack/safe-rlhf/output/rm_30k --log_type wandb --log_project Safe-RLHF-RM --zero_stage 3 --bf16 True --tf32 True
2024-01-05 20:02:46.835068: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-05 20:02:46.835067: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-05 20:02:46.835067: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-05 20:02:46.835114: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-05 20:02:46.835114: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-05 20:02:46.835114: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-05 20:02:46.835826: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-05 20:02:46.835865: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-05 20:02:46.836421: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-05 20:02:46.836422: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-05 20:02:46.836424: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-05 20:02:46.836771: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-05 20:02:48.497891: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-05 20:02:48.498124: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-05 20:02:48.498360: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-05 20:02:48.498588: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Some weights of the model checkpoint at output/sft were not used when initializing LlamaModelForScore: ['lm_head.weight']
- This IS expected if you are initializing LlamaModelForScore from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaModelForScore from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at output/sft were not used when initializing LlamaModelForScore: ['lm_head.weight']
- This IS expected if you are initializing LlamaModelForScore from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaModelForScore from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at output/sft were not used when initializing LlamaModelForScore: ['lm_head.weight']
- This IS expected if you are initializing LlamaModelForScore from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaModelForScore from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LlamaModelForScore were not initialized from the model checkpoint at output/sft and are newly initialized: ['normalizer.count', 'normalizer.mean', 'normalizer.var']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of LlamaModelForScore were not initialized from the model checkpoint at output/sft and are newly initialized: ['normalizer.count', 'normalizer.var', 'normalizer.mean']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of LlamaModelForScore were not initialized from the model checkpoint at output/sft and are newly initialized: ['normalizer.mean', 'normalizer.var', 'normalizer.count']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Some weights of the model checkpoint at output/sft were not used when initializing LlamaModelForScore: ['lm_head.weight']
- This IS expected if you are initializing LlamaModelForScore from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaModelForScore from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of LlamaModelForScore were not initialized from the model checkpoint at output/sft and are newly initialized: ['normalizer.var', 'normalizer.mean', 'normalizer.count']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Using /data/jiongxiao_wang/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /data/jiongxiao_wang/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /data/jiongxiao_wang/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /data/jiongxiao_wang/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /data/jiongxiao_wang/.cache/torch_extensions/py310_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
Loading extension module fused_adam...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
wandb: Currently logged in as: jayfeather (jayfeather1024). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.16.1
wandb: Run data is saved locally in /data/jiongxiao_wang/rlhf_attack/safe-rlhf/output/rm_30k/wandb/run-20240105_200327-0bh9htd8
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run reward-2024-01-05-20-03-25
wandb: ⭐️ View project at https://wandb.ai/jayfeather1024/Safe-RLHF-RM
wandb: πŸš€ View run at https://wandb.ai/jayfeather1024/Safe-RLHF-RM/runs/0bh9htd8
Training 1/2 epoch: 0%| | 0/840 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
Training 1/2 epoch (loss 0.6953): 0%| | 0/840 [00:05<?, ?it/s] Training 1/2 epoch (loss 0.6953): 0%| | 1/840 [00:05<1:13:24, 5.25s/it] Training 1/2 epoch (loss 0.6914): 0%| | 1/840 [00:08<1:13:24, 5.25s/it] Training 1/2 epoch (loss 0.6914): 0%| | 2/840 [00:08<1:00:36, 4.34s/it] Training 1/2 epoch (loss 0.6953): 0%| | 2/840 [00:11<1:00:36, 4.34s/it] Training 1/2 epoch (loss 0.6953): 0%| | 3/840 [00:11<51:24, 3.69s/it] Training 1/2 epoch (loss 0.6914): 0%| | 3/840 [00:14<51:24, 3.69s/it] Training 1/2 epoch (loss 0.6914): 0%| | 4/840 [00:14<47:14, 3.39s/it] Training 1/2 epoch (loss 0.6953): 0%| | 4/840 [00:19<47:14, 3.39s/it] Training 1/2 epoch (loss 0.6953): 1%| | 5/840 [00:19<52:19, 3.76s/it] Training 1/2 epoch (loss 0.6953): 1%| | 5/840 [00:22<52:19, 3.76s/it] Training 1/2 epoch (loss 0.6953): 1%| | 6/840 [00:22<50:49, 3.66s/it] Training 1/2 epoch (loss 0.6953): 1%| | 6/840 [00:26<50:49, 3.66s/it] Training 1/2 epoch (loss 0.6953): 1%| | 7/840 [00:26<51:38, 3.72s/it] Training 1/2 epoch (loss 0.6953): 1%| | 7/840 [00:32<51:38, 3.72s/it] Training 1/2 epoch (loss 0.6953): 1%| | 8/840 [00:32<59:29, 4.29s/it] Training 1/2 epoch (loss 0.6914): 1%| | 8/840 [00:35<59:29, 4.29s/it] Training 1/2 epoch (loss 0.6914): 1%| | 9/840 [00:35<57:58, 4.19s/it] Training 1/2 epoch (loss 0.6914): 1%| | 9/840 [00:39<57:58, 4.19s/it] Training 1/2 epoch (loss 0.6914): 1%| | 10/840 [00:39<53:10, 3.84s/it] Training 1/2 epoch (loss 0.6953): 1%| | 10/840 [00:41<53:10, 3.84s/it] Training 1/2 epoch (loss 0.6953): 1%|▏ | 11/840 [00:41<47:33, 3.44s/it] Training 1/2 epoch (loss 0.6953): 1%|▏ | 11/840 [00:45<47:33, 3.44s/it] Training 1/2 epoch (loss 0.6953): 1%|▏ | 12/840 [00:45<50:24, 3.65s/it] Training 1/2 epoch (loss 0.6953): 1%|▏ | 12/840 [00:48<50:24, 3.65s/it] Training 1/2 epoch (loss 0.6953): 2%|▏ | 13/840 [00:48<45:59, 3.34s/it] Training 1/2 epoch (loss 0.6953): 2%|▏ | 13/840 [00:51<45:59, 3.34s/it] Training 1/2 epoch (loss 0.6953): 2%|▏ | 14/840 [00:51<46:46, 3.40s/it] Training 1/2 epoch (loss 0.6914): 2%|▏ | 14/840 [00:54<46:46, 3.40s/it] Training 1/2 epoch (loss 0.6914): 2%|▏ | 15/840 [00:54<44:37, 3.25s/it] Training 1/2 epoch (loss 0.6875): 2%|▏ | 15/840 [00:58<44:37, 3.25s/it] Training 1/2 epoch (loss 0.6875): 2%|▏ | 16/840 [00:58<45:35, 3.32s/it] Training 1/2 epoch (loss 0.6953): 2%|▏ | 16/840 [01:01<45:35, 3.32s/it] Training 1/2 epoch (loss 0.6953): 2%|▏ | 17/840 [01:01<43:42, 3.19s/it] Training 1/2 epoch (loss 0.6914): 2%|▏ | 17/840 [01:04<43:42, 3.19s/it] Training 1/2 epoch (loss 0.6914): 2%|▏ | 18/840 [01:04<44:04, 3.22s/it] Training 1/2 epoch (loss 0.6953): 2%|▏ | 18/840 [01:08<44:04, 3.22s/it] Training 1/2 epoch (loss 0.6953): 2%|▏ | 19/840 [01:08<48:10, 3.52s/it] Training 1/2 epoch (loss 0.6875): 2%|▏ | 19/840 [01:12<48:10, 3.52s/it] Training 1/2 epoch (loss 0.6875): 2%|▏ | 20/840 [01:12<49:31, 3.62s/it] Training 1/2 epoch (loss 0.6953): 2%|▏ | 20/840 [01:16<49:31, 3.62s/it] Training 1/2 epoch (loss 0.6953): 2%|β–Ž | 21/840 [01:16<49:34, 3.63s/it] Training 1/2 epoch (loss 0.6914): 2%|β–Ž | 21/840 [01:18<49:34, 3.63s/it] Training 1/2 epoch (loss 0.6914): 3%|β–Ž | 22/840 [01:18<45:56, 3.37s/it] Training 1/2 epoch (loss 0.6953): 3%|β–Ž | 22/840 [01:21<45:56, 3.37s/it] Training 1/2 epoch (loss 0.6953): 3%|β–Ž | 23/840 [01:21<43:13, 3.17s/it] Training 1/2 epoch (loss 0.6875): 3%|β–Ž | 23/840 [01:25<43:13, 3.17s/it] Training 1/2 epoch (loss 0.6875): 3%|β–Ž | 24/840 [01:25<45:43, 3.36s/it] Training 1/2 epoch (loss 0.6797): 3%|β–Ž | 24/840 [01:30<45:43, 3.36s/it] Training 1/2 epoch (loss 0.6797): 3%|β–Ž | 25/840 [01:30<51:21, 3.78s/it] Training 1/2 epoch (loss 0.6836): 3%|β–Ž | 25/840 [01:35<51:21, 3.78s/it] Training 1/2 epoch (loss 0.6836): 3%|β–Ž | 26/840 [01:35<56:20, 4.15s/it] Training 1/2 epoch (loss 0.6641): 3%|β–Ž | 26/840 [01:38<56:20, 4.15s/it] Training 1/2 epoch (loss 0.6641): 3%|β–Ž | 27/840 [01:38<53:18, 3.93s/it] Training 1/2 epoch (loss 0.6562): 3%|β–Ž | 27/840 [01:42<53:18, 3.93s/it] Training 1/2 epoch (loss 0.6562): 3%|β–Ž | 28/840 [01:42<50:55, 3.76s/it] Training 1/2 epoch (loss 0.6602): 3%|β–Ž | 28/840 [01:44<50:55, 3.76s/it] Training 1/2 epoch (loss 0.6602): 3%|β–Ž | 29/840 [01:44<45:44, 3.38s/it] Training 1/2 epoch (loss 0.6562): 3%|β–Ž | 29/840 [01:49<45:44, 3.38s/it] Training 1/2 epoch (loss 0.6562): 4%|β–Ž | 30/840 [01:49<52:16, 3.87s/it] Training 1/2 epoch (loss 0.7031): 4%|β–Ž | 30/840 [01:54<52:16, 3.87s/it] Training 1/2 epoch (loss 0.7031): 4%|β–Ž | 31/840 [01:54<55:12, 4.09s/it] Training 1/2 epoch (loss 0.6602): 4%|β–Ž | 31/840 [01:57<55:12, 4.09s/it] Training 1/2 epoch (loss 0.6602): 4%|▍ | 32/840 [01:57<52:52, 3.93s/it] Training 1/2 epoch (loss 0.5938): 4%|▍ | 32/840 [02:03<52:52, 3.93s/it] Training 1/2 epoch (loss 0.5938): 4%|▍ | 33/840 [02:03<58:43, 4.37s/it] Training 1/2 epoch (loss 0.6289): 4%|▍ | 33/840 [02:06<58:43, 4.37s/it] Training 1/2 epoch (loss 0.6289): 4%|▍ | 34/840 [02:06<55:01, 4.10s/it] Training 1/2 epoch (loss 0.6055): 4%|▍ | 34/840 [02:11<55:01, 4.10s/it] Training 1/2 epoch (loss 0.6055): 4%|▍ | 35/840 [02:11<56:57, 4.25s/it] Training 1/2 epoch (loss 0.5391): 4%|▍ | 35/840 [02:14<56:57, 4.25s/it] Training 1/2 epoch (loss 0.5391): 4%|▍ | 36/840 [02:14<54:14, 4.05s/it] Training 1/2 epoch (loss 0.6562): 4%|▍ | 36/840 [02:17<54:14, 4.05s/it] Training 1/2 epoch (loss 0.6562): 4%|▍ | 37/840 [02:17<50:34, 3.78s/it] Training 1/2 epoch (loss 0.5430): 4%|▍ | 37/840 [02:21<50:34, 3.78s/it] Training 1/2 epoch (loss 0.5430): 5%|▍ | 38/840 [02:21<48:03, 3.60s/it] Training 1/2 epoch (loss 0.6016): 5%|▍ | 38/840 [02:25<48:03, 3.60s/it] Training 1/2 epoch (loss 0.6016): 5%|▍ | 39/840 [02:25<52:33, 3.94s/it] Training 1/2 epoch (loss 0.6094): 5%|▍ | 39/840 [02:30<52:33, 3.94s/it] Training 1/2 epoch (loss 0.6094): 5%|▍ | 40/840 [02:30<57:21, 4.30s/it] Training 1/2 epoch (loss 0.6172): 5%|▍ | 40/840 [02:35<57:21, 4.30s/it] Training 1/2 epoch (loss 0.6172): 5%|▍ | 41/840 [02:35<57:42, 4.33s/it] Training 1/2 epoch (loss 0.5938): 5%|▍ | 41/840 [02:38<57:42, 4.33s/it] Training 1/2 epoch (loss 0.5938): 5%|β–Œ | 42/840 [02:38<54:38, 4.11s/it] Training 1/2 epoch (loss 0.6172): 5%|β–Œ | 42/840 [02:43<54:38, 4.11s/it] Training 1/2 epoch (loss 0.6172): 5%|β–Œ | 43/840 [02:43<55:57, 4.21s/it] Training 1/2 epoch (loss 0.6797): 5%|β–Œ | 43/840 [02:47<55:57, 4.21s/it] Training 1/2 epoch (loss 0.6797): 5%|β–Œ | 44/840 [02:47<55:39, 4.20s/it] Training 1/2 epoch (loss 0.5820): 5%|β–Œ | 44/840 [02:50<55:39, 4.20s/it] Training 1/2 epoch (loss 0.5820): 5%|β–Œ | 45/840 [02:50<49:10, 3.71s/it] Training 1/2 epoch (loss 0.5469): 5%|β–Œ | 45/840 [02:55<49:10, 3.71s/it] Training 1/2 epoch (loss 0.5469): 5%|β–Œ | 46/840 [02:55<56:44, 4.29s/it] Training 1/2 epoch (loss 0.5859): 5%|β–Œ | 46/840 [02:59<56:44, 4.29s/it] Training 1/2 epoch (loss 0.5859): 6%|β–Œ | 47/840 [02:59<54:20, 4.11s/it] Training 1/2 epoch (loss 0.5859): 6%|β–Œ | 47/840 [03:02<54:20, 4.11s/it] Training 1/2 epoch (loss 0.5859): 6%|β–Œ | 48/840 [03:02<50:15, 3.81s/it] Training 1/2 epoch (loss 0.7266): 6%|β–Œ | 48/840 [03:05<50:15, 3.81s/it] Training 1/2 epoch (loss 0.7266): 6%|β–Œ | 49/840 [03:05<45:19, 3.44s/it] Training 1/2 epoch (loss 0.6914): 6%|β–Œ | 49/840 [03:07<45:19, 3.44s/it] Training 1/2 epoch (loss 0.6914): 6%|β–Œ | 50/840 [03:07<42:28, 3.23s/it] Training 1/2 epoch (loss 0.6016): 6%|β–Œ | 50/840 [03:11<42:28, 3.23s/it] Training 1/2 epoch (loss 0.6016): 6%|β–Œ | 51/840 [03:11<46:00, 3.50s/it] Training 1/2 epoch (loss 0.6250): 6%|β–Œ | 51/840 [03:15<46:00, 3.50s/it] Training 1/2 epoch (loss 0.6250): 6%|β–Œ | 52/840 [03:15<44:52, 3.42s/it] Training 1/2 epoch (loss 0.6328): 6%|β–Œ | 52/840 [03:18<44:52, 3.42s/it] Training 1/2 epoch (loss 0.6328): 6%|β–‹ | 53/840 [03:18<42:53, 3.27s/it] Training 1/2 epoch (loss 0.5938): 6%|β–‹ | 53/840 [03:21<42:53, 3.27s/it] Training 1/2 epoch (loss 0.5938): 6%|β–‹ | 54/840 [03:21<42:21, 3.23s/it] Training 1/2 epoch (loss 0.6289): 6%|β–‹ | 54/840 [03:24<42:21, 3.23s/it] Training 1/2 epoch (loss 0.6289): 7%|β–‹ | 55/840 [03:24<43:03, 3.29s/it] Training 1/2 epoch (loss 0.5859): 7%|β–‹ | 55/840 [03:27<43:03, 3.29s/it] Training 1/2 epoch (loss 0.5859): 7%|β–‹ | 56/840 [03:27<42:25, 3.25s/it] Training 1/2 epoch (loss 0.6719): 7%|β–‹ | 56/840 [03:30<42:25, 3.25s/it] Training 1/2 epoch (loss 0.6719): 7%|β–‹ | 57/840 [03:30<41:27, 3.18s/it] Training 1/2 epoch (loss 0.5859): 7%|β–‹ | 57/840 [03:34<41:27, 3.18s/it] Training 1/2 epoch (loss 0.5859): 7%|β–‹ | 58/840 [03:34<42:55, 3.29s/it] Training 1/2 epoch (loss 0.6406): 7%|β–‹ | 58/840 [03:39<42:55, 3.29s/it] Training 1/2 epoch (loss 0.6406): 7%|β–‹ | 59/840 [03:39<48:06, 3.70s/it] Training 1/2 epoch (loss 0.5312): 7%|β–‹ | 59/840 [03:44<48:06, 3.70s/it] Training 1/2 epoch (loss 0.5312): 7%|β–‹ | 60/840 [03:44<54:55, 4.22s/it] Training 1/2 epoch (loss 0.5547): 7%|β–‹ | 60/840 [03:47<54:55, 4.22s/it] Training 1/2 epoch (loss 0.5547): 7%|β–‹ | 61/840 [03:47<50:47, 3.91s/it] Training 1/2 epoch (loss 0.6914): 7%|β–‹ | 61/840 [03:51<50:47, 3.91s/it] Training 1/2 epoch (loss 0.6914): 7%|β–‹ | 62/840 [03:51<50:41, 3.91s/it] Training 1/2 epoch (loss 0.6484): 7%|β–‹ | 62/840 [03:57<50:41, 3.91s/it] Training 1/2 epoch (loss 0.6484): 8%|β–Š | 63/840 [03:57<56:32, 4.37s/it] Training 1/2 epoch (loss 0.7578): 8%|β–Š | 63/840 [04:01<56:32, 4.37s/it] Training 1/2 epoch (loss 0.7578): 8%|β–Š | 64/840 [04:01<57:20, 4.43s/it] Training 1/2 epoch (loss 0.5820): 8%|β–Š | 64/840 [04:04<57:20, 4.43s/it] Training 1/2 epoch (loss 0.5820): 8%|β–Š | 65/840 [04:04<51:12, 3.97s/it] Training 1/2 epoch (loss 0.5977): 8%|β–Š | 65/840 [04:07<51:12, 3.97s/it] Training 1/2 epoch (loss 0.5977): 8%|β–Š | 66/840 [04:07<46:42, 3.62s/it] Training 1/2 epoch (loss 0.6758): 8%|β–Š | 66/840 [04:11<46:42, 3.62s/it] Training 1/2 epoch (loss 0.6758): 8%|β–Š | 67/840 [04:11<48:13, 3.74s/it] Training 1/2 epoch (loss 0.5938): 8%|β–Š | 67/840 [04:15<48:13, 3.74s/it] Training 1/2 epoch (loss 0.5938): 8%|β–Š | 68/840 [04:15<50:21, 3.91s/it] Training 1/2 epoch (loss 0.6719): 8%|β–Š | 68/840 [04:18<50:21, 3.91s/it] Training 1/2 epoch (loss 0.6719): 8%|β–Š | 69/840 [04:18<45:26, 3.54s/it] Training 1/2 epoch (loss 0.5508): 8%|β–Š | 69/840 [04:22<45:26, 3.54s/it] Training 1/2 epoch (loss 0.5508): 8%|β–Š | 70/840 [04:22<47:49, 3.73s/it] Training 1/2 epoch (loss 0.5820): 8%|β–Š | 70/840 [04:25<47:49, 3.73s/it] Training 1/2 epoch (loss 0.5820): 8%|β–Š | 71/840 [04:25<46:18, 3.61s/it] Training 1/2 epoch (loss 0.6055): 8%|β–Š | 71/840 [04:31<46:18, 3.61s/it] Training 1/2 epoch (loss 0.6055): 9%|β–Š | 72/840 [04:31<53:57, 4.21s/it] Training 1/2 epoch (loss 0.5938): 9%|β–Š | 72/840 [04:36<53:57, 4.21s/it] Training 1/2 epoch (loss 0.5938): 9%|β–Š | 73/840 [04:36<58:31, 4.58s/it] Training 1/2 epoch (loss 0.6133): 9%|β–Š | 73/840 [04:41<58:31, 4.58s/it] Training 1/2 epoch (loss 0.6133): 9%|β–‰ | 74/840 [04:41<58:52, 4.61s/it] Training 1/2 epoch (loss 0.5938): 9%|β–‰ | 74/840 [04:47<58:52, 4.61s/it] Training 1/2 epoch (loss 0.5938): 9%|β–‰ | 75/840 [04:47<1:02:10, 4.88s/it] Training 1/2 epoch (loss 0.6094): 9%|β–‰ | 75/840 [04:49<1:02:10, 4.88s/it] Training 1/2 epoch (loss 0.6094): 9%|β–‰ | 76/840 [04:49<53:39, 4.21s/it] Training 1/2 epoch (loss 0.6172): 9%|β–‰ | 76/840 [04:52<53:39, 4.21s/it] Training 1/2 epoch (loss 0.6172): 9%|β–‰ | 77/840 [04:52<49:35, 3.90s/it] Training 1/2 epoch (loss 0.6094): 9%|β–‰ | 77/840 [04:55<49:35, 3.90s/it] Training 1/2 epoch (loss 0.6094): 9%|β–‰ | 78/840 [04:55<45:17, 3.57s/it] Training 1/2 epoch (loss 0.5664): 9%|β–‰ | 78/840 [05:00<45:17, 3.57s/it] Training 1/2 epoch (loss 0.5664): 9%|β–‰ | 79/840 [05:00<48:35, 3.83s/it] Training 1/2 epoch (loss 0.6523): 9%|β–‰ | 79/840 [05:03<48:35, 3.83s/it] Training 1/2 epoch (loss 0.6523): 10%|β–‰ | 80/840 [05:03<45:59, 3.63s/it] Training 1/2 epoch (loss 0.5625): 10%|β–‰ | 80/840 [05:06<45:59, 3.63s/it] Training 1/2 epoch (loss 0.5625): 10%|β–‰ | 81/840 [05:06<44:57, 3.55s/it] Training 1/2 epoch (loss 0.6328): 10%|β–‰ | 81/840 [05:09<44:57, 3.55s/it] Training 1/2 epoch (loss 0.6328): 10%|β–‰ | 82/840 [05:09<44:02, 3.49s/it] Training 1/2 epoch (loss 0.5781): 10%|β–‰ | 82/840 [05:14<44:02, 3.49s/it] Training 1/2 epoch (loss 0.5781): 10%|β–‰ | 83/840 [05:14<47:30, 3.77s/it] Training 1/2 epoch (loss 0.5625): 10%|β–‰ | 83/840 [05:18<47:30, 3.77s/it] Training 1/2 epoch (loss 0.5625): 10%|β–ˆ | 84/840 [05:18<47:50, 3.80s/it] Training 1/2 epoch (loss 0.6211): 10%|β–ˆ | 84/840 [05:21<47:50, 3.80s/it] Training 1/2 epoch (loss 0.6211): 10%|β–ˆ | 85/840 [05:21<44:31, 3.54s/it] Training 1/2 epoch (loss 0.6953): 10%|β–ˆ | 85/840 [05:24<44:31, 3.54s/it] Training 1/2 epoch (loss 0.6953): 10%|β–ˆ | 86/840 [05:24<41:40, 3.32s/it] Training 1/2 epoch (loss 0.5977): 10%|β–ˆ | 86/840 [05:29<41:40, 3.32s/it] Training 1/2 epoch (loss 0.5977): 10%|β–ˆ | 87/840 [05:29<49:26, 3.94s/it] Training 1/2 epoch (loss 0.6562): 10%|β–ˆ | 87/840 [05:33<49:26, 3.94s/it] Training 1/2 epoch (loss 0.6562): 10%|β–ˆ | 88/840 [05:33<50:02, 3.99s/it] Training 1/2 epoch (loss 0.6250): 10%|β–ˆ | 88/840 [05:37<50:02, 3.99s/it] Training 1/2 epoch (loss 0.6250): 11%|β–ˆ | 89/840 [05:37<48:02, 3.84s/it] Training 1/2 epoch (loss 0.5547): 11%|β–ˆ | 89/840 [05:40<48:02, 3.84s/it] Training 1/2 epoch (loss 0.5547): 11%|β–ˆ | 90/840 [05:40<46:38, 3.73s/it] Training 1/2 epoch (loss 0.6836): 11%|β–ˆ | 90/840 [05:43<46:38, 3.73s/it] Training 1/2 epoch (loss 0.6836): 11%|β–ˆ | 91/840 [05:43<42:55, 3.44s/it] Training 1/2 epoch (loss 0.5664): 11%|β–ˆ | 91/840 [05:46<42:55, 3.44s/it] Training 1/2 epoch (loss 0.5664): 11%|β–ˆ | 92/840 [05:46<41:53, 3.36s/it] Training 1/2 epoch (loss 0.6250): 11%|β–ˆ | 92/840 [05:49<41:53, 3.36s/it] Training 1/2 epoch (loss 0.6250): 11%|β–ˆ | 93/840 [05:49<39:38, 3.18s/it] Training 1/2 epoch (loss 0.5781): 11%|β–ˆ | 93/840 [05:52<39:38, 3.18s/it] Training 1/2 epoch (loss 0.5781): 11%|β–ˆ | 94/840 [05:52<39:49, 3.20s/it] Training 1/2 epoch (loss 0.5977): 11%|β–ˆ | 94/840 [05:56<39:49, 3.20s/it] Training 1/2 epoch (loss 0.5977): 11%|β–ˆβ– | 95/840 [05:56<42:01, 3.38s/it] Training 1/2 epoch (loss 0.5664): 11%|β–ˆβ– | 95/840 [06:01<42:01, 3.38s/it] Training 1/2 epoch (loss 0.5664): 11%|β–ˆβ– | 96/840 [06:01<50:17, 4.06s/it] Training 1/2 epoch (loss 0.5820): 11%|β–ˆβ– | 96/840 [06:05<50:17, 4.06s/it] Training 1/2 epoch (loss 0.5820): 12%|β–ˆβ– | 97/840 [06:05<50:09, 4.05s/it] Training 1/2 epoch (loss 0.5586): 12%|β–ˆβ– | 97/840 [06:10<50:09, 4.05s/it] Training 1/2 epoch (loss 0.5586): 12%|β–ˆβ– | 98/840 [06:10<50:33, 4.09s/it] Training 1/2 epoch (loss 0.5547): 12%|β–ˆβ– | 98/840 [06:12<50:33, 4.09s/it] Training 1/2 epoch (loss 0.5547): 12%|β–ˆβ– | 99/840 [06:12<46:02, 3.73s/it] Training 1/2 epoch (loss 0.5352): 12%|β–ˆβ– | 99/840 [06:16<46:02, 3.73s/it] Training 1/2 epoch (loss 0.5352): 12%|β–ˆβ– | 100/840 [06:16<43:30, 3.53s/it] Training 1/2 epoch (loss 0.5312): 12%|β–ˆβ– | 100/840 [06:21<43:30, 3.53s/it] Training 1/2 epoch (loss 0.5312): 12%|β–ˆβ– | 101/840 [06:21<50:42, 4.12s/it] Training 1/2 epoch (loss 0.6172): 12%|β–ˆβ– | 101/840 [06:26<50:42, 4.12s/it] Training 1/2 epoch (loss 0.6172): 12%|β–ˆβ– | 102/840 [06:26<52:12, 4.24s/it] Training 1/2 epoch (loss 0.6094): 12%|β–ˆβ– | 102/840 [06:29<52:12, 4.24s/it] Training 1/2 epoch (loss 0.6094): 12%|β–ˆβ– | 103/840 [06:29<47:50, 3.90s/it] Training 1/2 epoch (loss 0.6445): 12%|β–ˆβ– | 103/840 [06:32<47:50, 3.90s/it] Training 1/2 epoch (loss 0.6445): 12%|β–ˆβ– | 104/840 [06:32<46:53, 3.82s/it] Training 1/2 epoch (loss 0.5781): 12%|β–ˆβ– | 104/840 [06:36<46:53, 3.82s/it] Training 1/2 epoch (loss 0.5781): 12%|β–ˆβ–Ž | 105/840 [06:36<47:00, 3.84s/it] Training 1/2 epoch (loss 0.6289): 12%|β–ˆβ–Ž | 105/840 [06:41<47:00, 3.84s/it] Training 1/2 epoch (loss 0.6289): 13%|β–ˆβ–Ž | 106/840 [06:41<49:32, 4.05s/it] Training 1/2 epoch (loss 0.5508): 13%|β–ˆβ–Ž | 106/840 [06:46<49:32, 4.05s/it] Training 1/2 epoch (loss 0.5508): 13%|β–ˆβ–Ž | 107/840 [06:46<54:25, 4.46s/it] Training 1/2 epoch (loss 0.5664): 13%|β–ˆβ–Ž | 107/840 [06:49<54:25, 4.46s/it] Training 1/2 epoch (loss 0.5664): 13%|β–ˆβ–Ž | 108/840 [06:49<49:55, 4.09s/it] Training 1/2 epoch (loss 0.5742): 13%|β–ˆβ–Ž | 108/840 [06:53<49:55, 4.09s/it] Training 1/2 epoch (loss 0.5742): 13%|β–ˆβ–Ž | 109/840 [06:53<46:38, 3.83s/it] Training 1/2 epoch (loss 0.6094): 13%|β–ˆβ–Ž | 109/840 [06:58<46:38, 3.83s/it] Training 1/2 epoch (loss 0.6094): 13%|β–ˆβ–Ž | 110/840 [06:58<52:31, 4.32s/it] Training 1/2 epoch (loss 0.6719): 13%|β–ˆβ–Ž | 110/840 [07:01<52:31, 4.32s/it] Training 1/2 epoch (loss 0.6719): 13%|β–ˆβ–Ž | 111/840 [07:01<46:24, 3.82s/it] Training 1/2 epoch (loss 0.5977): 13%|β–ˆβ–Ž | 111/840 [07:06<46:24, 3.82s/it] Training 1/2 epoch (loss 0.5977): 13%|β–ˆβ–Ž | 112/840 [07:06<52:37, 4.34s/it] Training 1/2 epoch (loss 0.6133): 13%|β–ˆβ–Ž | 112/840 [07:12<52:37, 4.34s/it] Training 1/2 epoch (loss 0.6133): 13%|β–ˆβ–Ž | 113/840 [07:12<56:29, 4.66s/it] Training 1/2 epoch (loss 0.6523): 13%|β–ˆβ–Ž | 113/840 [07:15<56:29, 4.66s/it] Training 1/2 epoch (loss 0.6523): 14%|β–ˆβ–Ž | 114/840 [07:15<50:43, 4.19s/it] Training 1/2 epoch (loss 0.5547): 14%|β–ˆβ–Ž | 114/840 [07:18<50:43, 4.19s/it] Training 1/2 epoch (loss 0.5547): 14%|β–ˆβ–Ž | 115/840 [07:18<49:00, 4.06s/it] Training 1/2 epoch (loss 0.5469): 14%|β–ˆβ–Ž | 115/840 [07:22<49:00, 4.06s/it] Training 1/2 epoch (loss 0.5469): 14%|β–ˆβ– | 116/840 [07:22<46:56, 3.89s/it] Training 1/2 epoch (loss 0.5625): 14%|β–ˆβ– | 116/840 [07:25<46:56, 3.89s/it] Training 1/2 epoch (loss 0.5625): 14%|β–ˆβ– | 117/840 [07:25<44:30, 3.69s/it] Training 1/2 epoch (loss 0.6094): 14%|β–ˆβ– | 117/840 [07:30<44:30, 3.69s/it] Training 1/2 epoch (loss 0.6094): 14%|β–ˆβ– | 118/840 [07:30<46:47, 3.89s/it] Training 1/2 epoch (loss 0.6797): 14%|β–ˆβ– | 118/840 [07:34<46:47, 3.89s/it] Training 1/2 epoch (loss 0.6797): 14%|β–ˆβ– | 119/840 [07:34<49:41, 4.14s/it] Training 1/2 epoch (loss 0.6172): 14%|β–ˆβ– | 119/840 [07:38<49:41, 4.14s/it] Training 1/2 epoch (loss 0.6172): 14%|β–ˆβ– | 120/840 [07:38<46:26, 3.87s/it] Training 1/2 epoch (loss 0.5117): 14%|β–ˆβ– | 120/840 [07:41<46:26, 3.87s/it] Training 1/2 epoch (loss 0.5117): 14%|β–ˆβ– | 121/840 [07:41<46:22, 3.87s/it] Training 1/2 epoch (loss 0.5859): 14%|β–ˆβ– | 121/840 [07:45<46:22, 3.87s/it] Training 1/2 epoch (loss 0.5859): 15%|β–ˆβ– | 122/840 [07:45<45:58, 3.84s/it] Training 1/2 epoch (loss 0.6523): 15%|β–ˆβ– | 122/840 [07:48<45:58, 3.84s/it] Training 1/2 epoch (loss 0.6523): 15%|β–ˆβ– | 123/840 [07:48<43:21, 3.63s/it] Training 1/2 epoch (loss 0.4961): 15%|β–ˆβ– | 123/840 [07:52<43:21, 3.63s/it] Training 1/2 epoch (loss 0.4961): 15%|β–ˆβ– | 124/840 [07:52<45:04, 3.78s/it] Training 1/2 epoch (loss 0.5547): 15%|β–ˆβ– | 124/840 [07:55<45:04, 3.78s/it] Training 1/2 epoch (loss 0.5547): 15%|β–ˆβ– | 125/840 [07:55<40:59, 3.44s/it] Training 1/2 epoch (loss 0.5586): 15%|β–ˆβ– | 125/840 [07:58<40:59, 3.44s/it] Training 1/2 epoch (loss 0.5586): 15%|β–ˆβ–Œ | 126/840 [07:58<40:06, 3.37s/it] Training 1/2 epoch (loss 0.5586): 15%|β–ˆβ–Œ | 126/840 [08:01<40:06, 3.37s/it] Training 1/2 epoch (loss 0.5586): 15%|β–ˆβ–Œ | 127/840 [08:01<39:16, 3.31s/it] Training 1/2 epoch (loss 0.6484): 15%|β–ˆβ–Œ | 127/840 [08:07<39:16, 3.31s/it] Training 1/2 epoch (loss 0.6484): 15%|β–ˆβ–Œ | 128/840 [08:07<47:01, 3.96s/it] Training 1/2 epoch (loss 0.5820): 15%|β–ˆβ–Œ | 128/840 [08:10<47:01, 3.96s/it] Training 1/2 epoch (loss 0.5820): 15%|β–ˆβ–Œ | 129/840 [08:10<43:25, 3.66s/it] Training 1/2 epoch (loss 0.6172): 15%|β–ˆβ–Œ | 129/840 [08:13<43:25, 3.66s/it] Training 1/2 epoch (loss 0.6172): 15%|β–ˆβ–Œ | 130/840 [08:13<41:16, 3.49s/it] Training 1/2 epoch (loss 0.5703): 15%|β–ˆβ–Œ | 130/840 [08:16<41:16, 3.49s/it] Training 1/2 epoch (loss 0.5703): 16%|β–ˆβ–Œ | 131/840 [08:16<38:30, 3.26s/it] Training 1/2 epoch (loss 0.6055): 16%|β–ˆβ–Œ | 131/840 [08:21<38:30, 3.26s/it] Training 1/2 epoch (loss 0.6055): 16%|β–ˆβ–Œ | 132/840 [08:21<44:13, 3.75s/it] Training 1/2 epoch (loss 0.5625): 16%|β–ˆβ–Œ | 132/840 [08:25<44:13, 3.75s/it] Training 1/2 epoch (loss 0.5625): 16%|β–ˆβ–Œ | 133/840 [08:25<45:08, 3.83s/it] Training 1/2 epoch (loss 0.6797): 16%|β–ˆβ–Œ | 133/840 [08:29<45:08, 3.83s/it] Training 1/2 epoch (loss 0.6797): 16%|β–ˆβ–Œ | 134/840 [08:29<47:18, 4.02s/it] Training 1/2 epoch (loss 0.6016): 16%|β–ˆβ–Œ | 134/840 [08:32<47:18, 4.02s/it] Training 1/2 epoch (loss 0.6016): 16%|β–ˆβ–Œ | 135/840 [08:32<44:09, 3.76s/it] Training 1/2 epoch (loss 0.6602): 16%|β–ˆβ–Œ | 135/840 [08:36<44:09, 3.76s/it] Training 1/2 epoch (loss 0.6602): 16%|β–ˆβ–Œ | 136/840 [08:36<44:22, 3.78s/it] Training 1/2 epoch (loss 0.5430): 16%|β–ˆβ–Œ | 136/840 [08:39<44:22, 3.78s/it] Training 1/2 epoch (loss 0.5430): 16%|β–ˆβ–‹ | 137/840 [08:39<40:15, 3.44s/it] Training 1/2 epoch (loss 0.5508): 16%|β–ˆβ–‹ | 137/840 [08:43<40:15, 3.44s/it] Training 1/2 epoch (loss 0.5508): 16%|β–ˆβ–‹ | 138/840 [08:43<44:42, 3.82s/it] Training 1/2 epoch (loss 0.6172): 16%|β–ˆβ–‹ | 138/840 [08:47<44:42, 3.82s/it] Training 1/2 epoch (loss 0.6172): 17%|β–ˆβ–‹ | 139/840 [08:47<45:27, 3.89s/it] Training 1/2 epoch (loss 0.5039): 17%|β–ˆβ–‹ | 139/840 [08:51<45:27, 3.89s/it] Training 1/2 epoch (loss 0.5039): 17%|β–ˆβ–‹ | 140/840 [08:51<43:36, 3.74s/it] Training 1/2 epoch (loss 0.5391): 17%|β–ˆβ–‹ | 140/840 [08:56<43:36, 3.74s/it] Training 1/2 epoch (loss 0.5391): 17%|β–ˆβ–‹ | 141/840 [08:56<47:11, 4.05s/it] Training 1/2 epoch (loss 0.5547): 17%|β–ˆβ–‹ | 141/840 [09:01<47:11, 4.05s/it] Training 1/2 epoch (loss 0.5547): 17%|β–ˆβ–‹ | 142/840 [09:01<52:18, 4.50s/it] Training 1/2 epoch (loss 0.5742): 17%|β–ˆβ–‹ | 142/840 [09:05<52:18, 4.50s/it] Training 1/2 epoch (loss 0.5742): 17%|β–ˆβ–‹ | 143/840 [09:05<49:46, 4.28s/it] Training 1/2 epoch (loss 0.4531): 17%|β–ˆβ–‹ | 143/840 [09:08<49:46, 4.28s/it] Training 1/2 epoch (loss 0.4531): 17%|β–ˆβ–‹ | 144/840 [09:08<45:43, 3.94s/it] Training 1/2 epoch (loss 0.6133): 17%|β–ˆβ–‹ | 144/840 [09:11<45:43, 3.94s/it] Training 1/2 epoch (loss 0.6133): 17%|β–ˆβ–‹ | 145/840 [09:11<42:21, 3.66s/it] Training 1/2 epoch (loss 0.5312): 17%|β–ˆβ–‹ | 145/840 [09:16<42:21, 3.66s/it] Training 1/2 epoch (loss 0.5312): 17%|β–ˆβ–‹ | 146/840 [09:16<47:11, 4.08s/it] Training 1/2 epoch (loss 0.5742): 17%|β–ˆβ–‹ | 146/840 [09:20<47:11, 4.08s/it] Training 1/2 epoch (loss 0.5742): 18%|β–ˆβ–Š | 147/840 [09:20<45:58, 3.98s/it] Training 1/2 epoch (loss 0.6641): 18%|β–ˆβ–Š | 147/840 [09:24<45:58, 3.98s/it] Training 1/2 epoch (loss 0.6641): 18%|β–ˆβ–Š | 148/840 [09:24<45:39, 3.96s/it] Training 1/2 epoch (loss 0.5859): 18%|β–ˆβ–Š | 148/840 [09:28<45:39, 3.96s/it] Training 1/2 epoch (loss 0.5859): 18%|β–ˆβ–Š | 149/840 [09:28<45:25, 3.94s/it] Training 1/2 epoch (loss 0.6172): 18%|β–ˆβ–Š | 149/840 [09:31<45:25, 3.94s/it] Training 1/2 epoch (loss 0.6172): 18%|β–ˆβ–Š | 150/840 [09:31<42:35, 3.70s/it] Training 1/2 epoch (loss 0.6562): 18%|β–ˆβ–Š | 150/840 [09:35<42:35, 3.70s/it] Training 1/2 epoch (loss 0.6562): 18%|β–ˆβ–Š | 151/840 [09:35<42:29, 3.70s/it] Training 1/2 epoch (loss 0.5664): 18%|β–ˆβ–Š | 151/840 [09:38<42:29, 3.70s/it] Training 1/2 epoch (loss 0.5664): 18%|β–ˆβ–Š | 152/840 [09:38<42:39, 3.72s/it] Training 1/2 epoch (loss 0.4863): 18%|β–ˆβ–Š | 152/840 [09:42<42:39, 3.72s/it] Training 1/2 epoch (loss 0.4863): 18%|β–ˆβ–Š | 153/840 [09:42<42:37, 3.72s/it] Training 1/2 epoch (loss 0.5625): 18%|β–ˆβ–Š | 153/840 [09:45<42:37, 3.72s/it] Training 1/2 epoch (loss 0.5625): 18%|β–ˆβ–Š | 154/840 [09:45<40:46, 3.57s/it] Training 1/2 epoch (loss 0.7422): 18%|β–ˆβ–Š | 154/840 [09:50<40:46, 3.57s/it] Training 1/2 epoch (loss 0.7422): 18%|β–ˆβ–Š | 155/840 [09:50<44:16, 3.88s/it] Training 1/2 epoch (loss 0.5703): 18%|β–ˆβ–Š | 155/840 [09:54<44:16, 3.88s/it] Training 1/2 epoch (loss 0.5703): 19%|β–ˆβ–Š | 156/840 [09:54<43:53, 3.85s/it] Training 1/2 epoch (loss 0.5625): 19%|β–ˆβ–Š | 156/840 [09:57<43:53, 3.85s/it] Training 1/2 epoch (loss 0.5625): 19%|β–ˆβ–Š | 157/840 [09:57<41:39, 3.66s/it] Training 1/2 epoch (loss 0.5742): 19%|β–ˆβ–Š | 157/840 [10:00<41:39, 3.66s/it] Training 1/2 epoch (loss 0.5742): 19%|β–ˆβ–‰ | 158/840 [10:00<38:51, 3.42s/it] Training 1/2 epoch (loss 0.5195): 19%|β–ˆβ–‰ | 158/840 [10:03<38:51, 3.42s/it] Training 1/2 epoch (loss 0.5195): 19%|β–ˆβ–‰ | 159/840 [10:03<39:53, 3.52s/it] Training 1/2 epoch (loss 0.6211): 19%|β–ˆβ–‰ | 159/840 [10:08<39:53, 3.52s/it] Training 1/2 epoch (loss 0.6211): 19%|β–ˆβ–‰ | 160/840 [10:08<42:35, 3.76s/it] Training 1/2 epoch (loss 0.5781): 19%|β–ˆβ–‰ | 160/840 [10:12<42:35, 3.76s/it] Training 1/2 epoch (loss 0.5781): 19%|β–ˆβ–‰ | 161/840 [10:12<44:55, 3.97s/it] Training 1/2 epoch (loss 0.6797): 19%|β–ˆβ–‰ | 161/840 [10:18<44:55, 3.97s/it] Training 1/2 epoch (loss 0.6797): 19%|β–ˆβ–‰ | 162/840 [10:18<50:23, 4.46s/it] Training 1/2 epoch (loss 0.6367): 19%|β–ˆβ–‰ | 162/840 [10:22<50:23, 4.46s/it] Training 1/2 epoch (loss 0.6367): 19%|β–ˆβ–‰ | 163/840 [10:22<49:29, 4.39s/it] Training 1/2 epoch (loss 0.5469): 19%|β–ˆβ–‰ | 163/840 [10:25<49:29, 4.39s/it] Training 1/2 epoch (loss 0.5469): 20%|β–ˆβ–‰ | 164/840 [10:25<45:14, 4.02s/it] Training 1/2 epoch (loss 0.5781): 20%|β–ˆβ–‰ | 164/840 [10:29<45:14, 4.02s/it] Training 1/2 epoch (loss 0.5781): 20%|β–ˆβ–‰ | 165/840 [10:29<44:03, 3.92s/it] Training 1/2 epoch (loss 0.5859): 20%|β–ˆβ–‰ | 165/840 [10:32<44:03, 3.92s/it] Training 1/2 epoch (loss 0.5859): 20%|β–ˆβ–‰ | 166/840 [10:32<42:16, 3.76s/it] Training 1/2 epoch (loss 0.5000): 20%|β–ˆβ–‰ | 166/840 [10:35<42:16, 3.76s/it] Training 1/2 epoch (loss 0.5000): 20%|β–ˆβ–‰ | 167/840 [10:35<39:59, 3.56s/it] Training 1/2 epoch (loss 0.5195): 20%|β–ˆβ–‰ | 167/840 [10:39<39:59, 3.56s/it] Training 1/2 epoch (loss 0.5195): 20%|β–ˆβ–ˆ | 168/840 [10:39<39:59, 3.57s/it] Training 1/2 epoch (loss 0.5742): 20%|β–ˆβ–ˆ | 168/840 [10:42<39:59, 3.57s/it] Training 1/2 epoch (loss 0.5742): 20%|β–ˆβ–ˆ | 169/840 [10:42<39:15, 3.51s/it] Training 1/2 epoch (loss 0.5781): 20%|β–ˆβ–ˆ | 169/840 [10:46<39:15, 3.51s/it] Training 1/2 epoch (loss 0.5781): 20%|β–ˆβ–ˆ | 170/840 [10:46<39:32, 3.54s/it] Training 1/2 epoch (loss 0.5820): 20%|β–ˆβ–ˆ | 170/840 [10:50<39:32, 3.54s/it] Training 1/2 epoch (loss 0.5820): 20%|β–ˆβ–ˆ | 171/840 [10:50<41:20, 3.71s/it] Training 1/2 epoch (loss 0.5430): 20%|β–ˆβ–ˆ | 171/840 [10:54<41:20, 3.71s/it] Training 1/2 epoch (loss 0.5430): 20%|β–ˆβ–ˆ | 172/840 [10:54<40:56, 3.68s/it] Training 1/2 epoch (loss 0.5469): 20%|β–ˆβ–ˆ | 172/840 [10:57<40:56, 3.68s/it] Training 1/2 epoch (loss 0.5469): 21%|β–ˆβ–ˆ | 173/840 [10:57<41:14, 3.71s/it] Training 1/2 epoch (loss 0.5938): 21%|β–ˆβ–ˆ | 173/840 [11:00<41:14, 3.71s/it] Training 1/2 epoch (loss 0.5938): 21%|β–ˆβ–ˆ | 174/840 [11:00<38:10, 3.44s/it] Training 1/2 epoch (loss 0.5938): 21%|β–ˆβ–ˆ | 174/840 [11:03<38:10, 3.44s/it] Training 1/2 epoch (loss 0.5938): 21%|β–ˆβ–ˆ | 175/840 [11:03<35:42, 3.22s/it] Training 1/2 epoch (loss 0.5547): 21%|β–ˆβ–ˆ | 175/840 [11:06<35:42, 3.22s/it] Training 1/2 epoch (loss 0.5547): 21%|β–ˆβ–ˆ | 176/840 [11:06<35:27, 3.20s/it] Training 1/2 epoch (loss 0.5938): 21%|β–ˆβ–ˆ | 176/840 [11:12<35:27, 3.20s/it] Training 1/2 epoch (loss 0.5938): 21%|β–ˆβ–ˆ | 177/840 [11:12<42:50, 3.88s/it] Training 1/2 epoch (loss 0.6094): 21%|β–ˆβ–ˆ | 177/840 [11:15<42:50, 3.88s/it] Training 1/2 epoch (loss 0.6094): 21%|β–ˆβ–ˆ | 178/840 [11:15<42:03, 3.81s/it] Training 1/2 epoch (loss 0.5938): 21%|β–ˆβ–ˆ | 178/840 [11:20<42:03, 3.81s/it] Training 1/2 epoch (loss 0.5938): 21%|β–ˆβ–ˆβ– | 179/840 [11:20<43:27, 3.94s/it] Training 1/2 epoch (loss 0.5547): 21%|β–ˆβ–ˆβ– | 179/840 [11:22<43:27, 3.94s/it] Training 1/2 epoch (loss 0.5547): 21%|β–ˆβ–ˆβ– | 180/840 [11:22<38:58, 3.54s/it] Training 1/2 epoch (loss 0.4941): 21%|β–ˆβ–ˆβ– | 180/840 [11:25<38:58, 3.54s/it] Training 1/2 epoch (loss 0.4941): 22%|β–ˆβ–ˆβ– | 181/840 [11:25<37:41, 3.43s/it] Training 1/2 epoch (loss 0.5391): 22%|β–ˆβ–ˆβ– | 181/840 [11:28<37:41, 3.43s/it] Training 1/2 epoch (loss 0.5391): 22%|β–ˆβ–ˆβ– | 182/840 [11:28<35:17, 3.22s/it] Training 1/2 epoch (loss 0.5586): 22%|β–ˆβ–ˆβ– | 182/840 [11:31<35:17, 3.22s/it] Training 1/2 epoch (loss 0.5586): 22%|β–ˆβ–ˆβ– | 183/840 [11:31<34:56, 3.19s/it] Training 1/2 epoch (loss 0.5664): 22%|β–ˆβ–ˆβ– | 183/840 [11:34<34:56, 3.19s/it] Training 1/2 epoch (loss 0.5664): 22%|β–ˆβ–ˆβ– | 184/840 [11:34<34:53, 3.19s/it] Training 1/2 epoch (loss 0.6250): 22%|β–ˆβ–ˆβ– | 184/840 [11:38<34:53, 3.19s/it] Training 1/2 epoch (loss 0.6250): 22%|β–ˆβ–ˆβ– | 185/840 [11:38<35:53, 3.29s/it] Training 1/2 epoch (loss 0.5625): 22%|β–ˆβ–ˆβ– | 185/840 [11:42<35:53, 3.29s/it] Training 1/2 epoch (loss 0.5625): 22%|β–ˆβ–ˆβ– | 186/840 [11:42<39:21, 3.61s/it] Training 1/2 epoch (loss 0.6328): 22%|β–ˆβ–ˆβ– | 186/840 [11:46<39:21, 3.61s/it] Training 1/2 epoch (loss 0.6328): 22%|β–ˆβ–ˆβ– | 187/840 [11:46<38:48, 3.57s/it] Training 1/2 epoch (loss 0.6055): 22%|β–ˆβ–ˆβ– | 187/840 [11:50<38:48, 3.57s/it] Training 1/2 epoch (loss 0.6055): 22%|β–ˆβ–ˆβ– | 188/840 [11:50<41:44, 3.84s/it] Training 1/2 epoch (loss 0.6289): 22%|β–ˆβ–ˆβ– | 188/840 [11:53<41:44, 3.84s/it] Training 1/2 epoch (loss 0.6289): 22%|β–ˆβ–ˆβ–Ž | 189/840 [11:53<39:10, 3.61s/it] Training 1/2 epoch (loss 0.5625): 22%|β–ˆβ–ˆβ–Ž | 189/840 [11:57<39:10, 3.61s/it] Training 1/2 epoch (loss 0.5625): 23%|β–ˆβ–ˆβ–Ž | 190/840 [11:57<38:05, 3.52s/it] Training 1/2 epoch (loss 0.4395): 23%|β–ˆβ–ˆβ–Ž | 190/840 [12:00<38:05, 3.52s/it] Training 1/2 epoch (loss 0.4395): 23%|β–ˆβ–ˆβ–Ž | 191/840 [12:00<39:07, 3.62s/it] Training 1/2 epoch (loss 0.6641): 23%|β–ˆβ–ˆβ–Ž | 191/840 [12:04<39:07, 3.62s/it] Training 1/2 epoch (loss 0.6641): 23%|β–ˆβ–ˆβ–Ž | 192/840 [12:04<38:55, 3.60s/it] Training 1/2 epoch (loss 0.5312): 23%|β–ˆβ–ˆβ–Ž | 192/840 [12:07<38:55, 3.60s/it] Training 1/2 epoch (loss 0.5312): 23%|β–ˆβ–ˆβ–Ž | 193/840 [12:07<37:09, 3.45s/it] Training 1/2 epoch (loss 0.5195): 23%|β–ˆβ–ˆβ–Ž | 193/840 [12:11<37:09, 3.45s/it] Training 1/2 epoch (loss 0.5195): 23%|β–ˆβ–ˆβ–Ž | 194/840 [12:11<38:19, 3.56s/it] Training 1/2 epoch (loss 0.5352): 23%|β–ˆβ–ˆβ–Ž | 194/840 [12:14<38:19, 3.56s/it] Training 1/2 epoch (loss 0.5352): 23%|β–ˆβ–ˆβ–Ž | 195/840 [12:14<35:59, 3.35s/it] Training 1/2 epoch (loss 0.5859): 23%|β–ˆβ–ˆβ–Ž | 195/840 [12:17<35:59, 3.35s/it] Training 1/2 epoch (loss 0.5859): 23%|β–ˆβ–ˆβ–Ž | 196/840 [12:17<36:03, 3.36s/it] Training 1/2 epoch (loss 0.5625): 23%|β–ˆβ–ˆβ–Ž | 196/840 [12:23<36:03, 3.36s/it] Training 1/2 epoch (loss 0.5625): 23%|β–ˆβ–ˆβ–Ž | 197/840 [12:23<42:51, 4.00s/it] Training 1/2 epoch (loss 0.5547): 23%|β–ˆβ–ˆβ–Ž | 197/840 [12:26<42:51, 4.00s/it] Training 1/2 epoch (loss 0.5547): 24%|β–ˆβ–ˆβ–Ž | 198/840 [12:26<42:06, 3.94s/it] Training 1/2 epoch (loss 0.5859): 24%|β–ˆβ–ˆβ–Ž | 198/840 [12:29<42:06, 3.94s/it] Training 1/2 epoch (loss 0.5859): 24%|β–ˆβ–ˆβ–Ž | 199/840 [12:29<39:15, 3.67s/it] Training 1/2 epoch (loss 0.5547): 24%|β–ˆβ–ˆβ–Ž | 199/840 [12:33<39:15, 3.67s/it] Training 1/2 epoch (loss 0.5547): 24%|β–ˆβ–ˆβ– | 200/840 [12:33<38:27, 3.61s/it] Training 1/2 epoch (loss 0.6406): 24%|β–ˆβ–ˆβ– | 200/840 [12:37<38:27, 3.61s/it] Training 1/2 epoch (loss 0.6406): 24%|β–ˆβ–ˆβ– | 201/840 [12:37<38:38, 3.63s/it] Training 1/2 epoch (loss 0.5586): 24%|β–ˆβ–ˆβ– | 201/840 [12:40<38:38, 3.63s/it] Training 1/2 epoch (loss 0.5586): 24%|β–ˆβ–ˆβ– | 202/840 [12:40<38:21, 3.61s/it] Training 1/2 epoch (loss 0.5352): 24%|β–ˆβ–ˆβ– | 202/840 [12:43<38:21, 3.61s/it] Training 1/2 epoch (loss 0.5352): 24%|β–ˆβ–ˆβ– | 203/840 [12:43<35:06, 3.31s/it] Training 1/2 epoch (loss 0.6211): 24%|β–ˆβ–ˆβ– | 203/840 [12:47<35:06, 3.31s/it] Training 1/2 epoch (loss 0.6211): 24%|β–ˆβ–ˆβ– | 204/840 [12:47<36:36, 3.45s/it] Training 1/2 epoch (loss 0.5469): 24%|β–ˆβ–ˆβ– | 204/840 [12:51<36:36, 3.45s/it] Training 1/2 epoch (loss 0.5469): 24%|β–ˆβ–ˆβ– | 205/840 [12:51<41:09, 3.89s/it] Training 1/2 epoch (loss 0.5469): 24%|β–ˆβ–ˆβ– | 205/840 [12:55<41:09, 3.89s/it] Training 1/2 epoch (loss 0.5469): 25%|β–ˆβ–ˆβ– | 206/840 [12:55<39:41, 3.76s/it] Training 1/2 epoch (loss 0.5664): 25%|β–ˆβ–ˆβ– | 206/840 [12:58<39:41, 3.76s/it] Training 1/2 epoch (loss 0.5664): 25%|β–ˆβ–ˆβ– | 207/840 [12:58<38:05, 3.61s/it] Training 1/2 epoch (loss 0.5234): 25%|β–ˆβ–ˆβ– | 207/840 [13:03<38:05, 3.61s/it] Training 1/2 epoch (loss 0.5234): 25%|β–ˆβ–ˆβ– | 208/840 [13:03<41:36, 3.95s/it] Training 1/2 epoch (loss 0.4707): 25%|β–ˆβ–ˆβ– | 208/840 [13:06<41:36, 3.95s/it] Training 1/2 epoch (loss 0.4707): 25%|β–ˆβ–ˆβ– | 209/840 [13:06<37:46, 3.59s/it] Training 1/2 epoch (loss 0.5781): 25%|β–ˆβ–ˆβ– | 209/840 [13:09<37:46, 3.59s/it] Training 1/2 epoch (loss 0.5781): 25%|β–ˆβ–ˆβ–Œ | 210/840 [13:09<36:41, 3.49s/it] Training 1/2 epoch (loss 0.6641): 25%|β–ˆβ–ˆβ–Œ | 210/840 [13:14<36:41, 3.49s/it] Training 1/2 epoch (loss 0.6641): 25%|β–ˆβ–ˆβ–Œ | 211/840 [13:14<42:53, 4.09s/it] Training 1/2 epoch (loss 0.5586): 25%|β–ˆβ–ˆβ–Œ | 211/840 [13:17<42:53, 4.09s/it] Training 1/2 epoch (loss 0.5586): 25%|β–ˆβ–ˆβ–Œ | 212/840 [13:17<39:21, 3.76s/it] Training 1/2 epoch (loss 0.4805): 25%|β–ˆβ–ˆβ–Œ | 212/840 [13:21<39:21, 3.76s/it] Training 1/2 epoch (loss 0.4805): 25%|β–ˆβ–ˆβ–Œ | 213/840 [13:21<37:16, 3.57s/it] Training 1/2 epoch (loss 0.6680): 25%|β–ˆβ–ˆβ–Œ | 213/840 [13:24<37:16, 3.57s/it] Training 1/2 epoch (loss 0.6680): 25%|β–ˆβ–ˆβ–Œ | 214/840 [13:24<37:07, 3.56s/it] Training 1/2 epoch (loss 0.5117): 25%|β–ˆβ–ˆβ–Œ | 214/840 [13:30<37:07, 3.56s/it] Training 1/2 epoch (loss 0.5117): 26%|β–ˆβ–ˆβ–Œ | 215/840 [13:30<43:01, 4.13s/it] Training 1/2 epoch (loss 0.5938): 26%|β–ˆβ–ˆβ–Œ | 215/840 [13:34<43:01, 4.13s/it] Training 1/2 epoch (loss 0.5938): 26%|β–ˆβ–ˆβ–Œ | 216/840 [13:34<42:56, 4.13s/it] Training 1/2 epoch (loss 0.5234): 26%|β–ˆβ–ˆβ–Œ | 216/840 [13:37<42:56, 4.13s/it] Training 1/2 epoch (loss 0.5234): 26%|β–ˆβ–ˆβ–Œ | 217/840 [13:37<42:01, 4.05s/it] Training 1/2 epoch (loss 0.4980): 26%|β–ˆβ–ˆβ–Œ | 217/840 [13:42<42:01, 4.05s/it] Training 1/2 epoch (loss 0.4980): 26%|β–ˆβ–ˆβ–Œ | 218/840 [13:42<44:51, 4.33s/it] Training 1/2 epoch (loss 0.6172): 26%|β–ˆβ–ˆβ–Œ | 218/840 [13:46<44:51, 4.33s/it] Training 1/2 epoch (loss 0.6172): 26%|β–ˆβ–ˆβ–Œ | 219/840 [13:46<42:43, 4.13s/it] Training 1/2 epoch (loss 0.4844): 26%|β–ˆβ–ˆβ–Œ | 219/840 [13:50<42:43, 4.13s/it] Training 1/2 epoch (loss 0.4844): 26%|β–ˆβ–ˆβ–Œ | 220/840 [13:50<43:17, 4.19s/it] Training 1/2 epoch (loss 0.5000): 26%|β–ˆβ–ˆβ–Œ | 220/840 [13:53<43:17, 4.19s/it] Training 1/2 epoch (loss 0.5000): 26%|β–ˆβ–ˆβ–‹ | 221/840 [13:53<38:47, 3.76s/it] Training 1/2 epoch (loss 0.5000): 26%|β–ˆβ–ˆβ–‹ | 221/840 [13:58<38:47, 3.76s/it] Training 1/2 epoch (loss 0.5000): 26%|β–ˆβ–ˆβ–‹ | 222/840 [13:58<42:18, 4.11s/it] Training 1/2 epoch (loss 0.6484): 26%|β–ˆβ–ˆβ–‹ | 222/840 [14:01<42:18, 4.11s/it] Training 1/2 epoch (loss 0.6484): 27%|β–ˆβ–ˆβ–‹ | 223/840 [14:01<39:47, 3.87s/it] Training 1/2 epoch (loss 0.5547): 27%|β–ˆβ–ˆβ–‹ | 223/840 [14:04<39:47, 3.87s/it] Training 1/2 epoch (loss 0.5547): 27%|β–ˆβ–ˆβ–‹ | 224/840 [14:04<35:40, 3.48s/it] Training 1/2 epoch (loss 0.5312): 27%|β–ˆβ–ˆβ–‹ | 224/840 [14:08<35:40, 3.48s/it] Training 1/2 epoch (loss 0.5312): 27%|β–ˆβ–ˆβ–‹ | 225/840 [14:08<38:42, 3.78s/it] Training 1/2 epoch (loss 0.5078): 27%|β–ˆβ–ˆβ–‹ | 225/840 [14:12<38:42, 3.78s/it] Training 1/2 epoch (loss 0.5078): 27%|β–ˆβ–ˆβ–‹ | 226/840 [14:12<38:07, 3.73s/it] Training 1/2 epoch (loss 0.5078): 27%|β–ˆβ–ˆβ–‹ | 226/840 [14:15<38:07, 3.73s/it] Training 1/2 epoch (loss 0.5078): 27%|β–ˆβ–ˆβ–‹ | 227/840 [14:15<36:02, 3.53s/it] Training 1/2 epoch (loss 0.6602): 27%|β–ˆβ–ˆβ–‹ | 227/840 [14:21<36:02, 3.53s/it] Training 1/2 epoch (loss 0.6602): 27%|β–ˆβ–ˆβ–‹ | 228/840 [14:21<42:18, 4.15s/it] Training 1/2 epoch (loss 0.5156): 27%|β–ˆβ–ˆβ–‹ | 228/840 [14:24<42:18, 4.15s/it] Training 1/2 epoch (loss 0.5156): 27%|β–ˆβ–ˆβ–‹ | 229/840 [14:24<39:52, 3.92s/it] Training 1/2 epoch (loss 0.5938): 27%|β–ˆβ–ˆβ–‹ | 229/840 [14:29<39:52, 3.92s/it] Training 1/2 epoch (loss 0.5938): 27%|β–ˆβ–ˆβ–‹ | 230/840 [14:29<41:38, 4.10s/it] Training 1/2 epoch (loss 0.5469): 27%|β–ˆβ–ˆβ–‹ | 230/840 [14:32<41:38, 4.10s/it] Training 1/2 epoch (loss 0.5469): 28%|β–ˆβ–ˆβ–Š | 231/840 [14:32<37:52, 3.73s/it] Training 1/2 epoch (loss 0.5586): 28%|β–ˆβ–ˆβ–Š | 231/840 [14:35<37:52, 3.73s/it] Training 1/2 epoch (loss 0.5586): 28%|β–ˆβ–ˆβ–Š | 232/840 [14:35<36:11, 3.57s/it] Training 1/2 epoch (loss 0.5898): 28%|β–ˆβ–ˆβ–Š | 232/840 [14:39<36:11, 3.57s/it] Training 1/2 epoch (loss 0.5898): 28%|β–ˆβ–ˆβ–Š | 233/840 [14:39<39:18, 3.89s/it] Training 1/2 epoch (loss 0.5742): 28%|β–ˆβ–ˆβ–Š | 233/840 [14:42<39:18, 3.89s/it] Training 1/2 epoch (loss 0.5742): 28%|β–ˆβ–ˆβ–Š | 234/840 [14:42<36:07, 3.58s/it] Training 1/2 epoch (loss 0.6562): 28%|β–ˆβ–ˆβ–Š | 234/840 [14:45<36:07, 3.58s/it] Training 1/2 epoch (loss 0.6562): 28%|β–ˆβ–ˆβ–Š | 235/840 [14:45<33:48, 3.35s/it] Training 1/2 epoch (loss 0.5938): 28%|β–ˆβ–ˆβ–Š | 235/840 [14:51<33:48, 3.35s/it] Training 1/2 epoch (loss 0.5938): 28%|β–ˆβ–ˆβ–Š | 236/840 [14:51<40:30, 4.02s/it] Training 1/2 epoch (loss 0.4961): 28%|β–ˆβ–ˆβ–Š | 236/840 [14:55<40:30, 4.02s/it] Training 1/2 epoch (loss 0.4961): 28%|β–ˆβ–ˆβ–Š | 237/840 [14:55<41:53, 4.17s/it] Training 1/2 epoch (loss 0.6016): 28%|β–ˆβ–ˆβ–Š | 237/840 [14:58<41:53, 4.17s/it] Training 1/2 epoch (loss 0.6016): 28%|β–ˆβ–ˆβ–Š | 238/840 [14:58<38:56, 3.88s/it] Training 1/2 epoch (loss 0.6172): 28%|β–ˆβ–ˆβ–Š | 238/840 [15:04<38:56, 3.88s/it] Training 1/2 epoch (loss 0.6172): 28%|β–ˆβ–ˆβ–Š | 239/840 [15:04<43:34, 4.35s/it] Training 1/2 epoch (loss 0.5039): 28%|β–ˆβ–ˆβ–Š | 239/840 [15:08<43:34, 4.35s/it] Training 1/2 epoch (loss 0.5039): 29%|β–ˆβ–ˆβ–Š | 240/840 [15:08<41:52, 4.19s/it] Training 1/2 epoch (loss 0.5078): 29%|β–ˆβ–ˆβ–Š | 240/840 [15:12<41:52, 4.19s/it] Training 1/2 epoch (loss 0.5078): 29%|β–ˆβ–ˆβ–Š | 241/840 [15:12<42:46, 4.28s/it] Training 1/2 epoch (loss 0.5078): 29%|β–ˆβ–ˆβ–Š | 241/840 [15:16<42:46, 4.28s/it] Training 1/2 epoch (loss 0.5078): 29%|β–ˆβ–ˆβ–‰ | 242/840 [15:16<41:55, 4.21s/it] Training 1/2 epoch (loss 0.5859): 29%|β–ˆβ–ˆβ–‰ | 242/840 [15:20<41:55, 4.21s/it] Training 1/2 epoch (loss 0.5859): 29%|β–ˆβ–ˆβ–‰ | 243/840 [15:20<39:42, 3.99s/it] Training 1/2 epoch (loss 0.5430): 29%|β–ˆβ–ˆβ–‰ | 243/840 [15:25<39:42, 3.99s/it] Training 1/2 epoch (loss 0.5430): 29%|β–ˆβ–ˆβ–‰ | 244/840 [15:25<44:13, 4.45s/it] Training 1/2 epoch (loss 0.5430): 29%|β–ˆβ–ˆβ–‰ | 244/840 [15:28<44:13, 4.45s/it] Training 1/2 epoch (loss 0.5430): 29%|β–ˆβ–ˆβ–‰ | 245/840 [15:28<40:21, 4.07s/it] Training 1/2 epoch (loss 0.6172): 29%|β–ˆβ–ˆβ–‰ | 245/840 [15:31<40:21, 4.07s/it] Training 1/2 epoch (loss 0.6172): 29%|β–ˆβ–ˆβ–‰ | 246/840 [15:31<37:37, 3.80s/it] Training 1/2 epoch (loss 0.5625): 29%|β–ˆβ–ˆβ–‰ | 246/840 [15:34<37:37, 3.80s/it] Training 1/2 epoch (loss 0.5625): 29%|β–ˆβ–ˆβ–‰ | 247/840 [15:34<33:59, 3.44s/it] Training 1/2 epoch (loss 0.5469): 29%|β–ˆβ–ˆβ–‰ | 247/840 [15:40<33:59, 3.44s/it] Training 1/2 epoch (loss 0.5469): 30%|β–ˆβ–ˆβ–‰ | 248/840 [15:40<40:13, 4.08s/it] Training 1/2 epoch (loss 0.5664): 30%|β–ˆβ–ˆβ–‰ | 248/840 [15:43<40:13, 4.08s/it] Training 1/2 epoch (loss 0.5664): 30%|β–ˆβ–ˆβ–‰ | 249/840 [15:43<37:10, 3.77s/it] Training 1/2 epoch (loss 0.5430): 30%|β–ˆβ–ˆβ–‰ | 249/840 [15:46<37:10, 3.77s/it] Training 1/2 epoch (loss 0.5430): 30%|β–ˆβ–ˆβ–‰ | 250/840 [15:46<34:46, 3.54s/it] Training 1/2 epoch (loss 0.5156): 30%|β–ˆβ–ˆβ–‰ | 250/840 [15:49<34:46, 3.54s/it] Training 1/2 epoch (loss 0.5156): 30%|β–ˆβ–ˆβ–‰ | 251/840 [15:49<35:15, 3.59s/it] Training 1/2 epoch (loss 0.5195): 30%|β–ˆβ–ˆβ–‰ | 251/840 [15:53<35:15, 3.59s/it] Training 1/2 epoch (loss 0.5195): 30%|β–ˆβ–ˆβ–ˆ | 252/840 [15:53<34:10, 3.49s/it] Training 1/2 epoch (loss 0.5352): 30%|β–ˆβ–ˆβ–ˆ | 252/840 [15:56<34:10, 3.49s/it] Training 1/2 epoch (loss 0.5352): 30%|β–ˆβ–ˆβ–ˆ | 253/840 [15:56<34:34, 3.53s/it] Training 1/2 epoch (loss 0.5625): 30%|β–ˆβ–ˆβ–ˆ | 253/840 [16:00<34:34, 3.53s/it] Training 1/2 epoch (loss 0.5625): 30%|β–ˆβ–ˆβ–ˆ | 254/840 [16:00<35:41, 3.65s/it] Training 1/2 epoch (loss 0.5391): 30%|β–ˆβ–ˆβ–ˆ | 254/840 [16:04<35:41, 3.65s/it] Training 1/2 epoch (loss 0.5391): 30%|β–ˆβ–ˆβ–ˆ | 255/840 [16:04<34:42, 3.56s/it] Training 1/2 epoch (loss 0.5391): 30%|β–ˆβ–ˆβ–ˆ | 255/840 [16:07<34:42, 3.56s/it] Training 1/2 epoch (loss 0.5391): 30%|β–ˆβ–ˆβ–ˆ | 256/840 [16:07<35:35, 3.66s/it] Training 1/2 epoch (loss 0.5547): 30%|β–ˆβ–ˆβ–ˆ | 256/840 [16:12<35:35, 3.66s/it] Training 1/2 epoch (loss 0.5547): 31%|β–ˆβ–ˆβ–ˆ | 257/840 [16:12<38:50, 4.00s/it] Training 1/2 epoch (loss 0.4883): 31%|β–ˆβ–ˆβ–ˆ | 257/840 [16:16<38:50, 4.00s/it] Training 1/2 epoch (loss 0.4883): 31%|β–ˆβ–ˆβ–ˆ | 258/840 [16:16<36:53, 3.80s/it] Training 1/2 epoch (loss 0.6133): 31%|β–ˆβ–ˆβ–ˆ | 258/840 [16:19<36:53, 3.80s/it] Training 1/2 epoch (loss 0.6133): 31%|β–ˆβ–ˆβ–ˆ | 259/840 [16:19<34:15, 3.54s/it] Training 1/2 epoch (loss 0.6641): 31%|β–ˆβ–ˆβ–ˆ | 259/840 [16:22<34:15, 3.54s/it] Training 1/2 epoch (loss 0.6641): 31%|β–ˆβ–ˆβ–ˆ | 260/840 [16:22<34:41, 3.59s/it] Training 1/2 epoch (loss 0.4961): 31%|β–ˆβ–ˆβ–ˆ | 260/840 [16:25<34:41, 3.59s/it] Training 1/2 epoch (loss 0.4961): 31%|β–ˆβ–ˆβ–ˆ | 261/840 [16:25<31:51, 3.30s/it] Training 1/2 epoch (loss 0.5273): 31%|β–ˆβ–ˆβ–ˆ | 261/840 [16:28<31:51, 3.30s/it] Training 1/2 epoch (loss 0.5273): 31%|β–ˆβ–ˆβ–ˆ | 262/840 [16:28<32:32, 3.38s/it] Training 1/2 epoch (loss 0.5430): 31%|β–ˆβ–ˆβ–ˆ | 262/840 [16:31<32:32, 3.38s/it] Training 1/2 epoch (loss 0.5430): 31%|β–ˆβ–ˆβ–ˆβ– | 263/840 [16:31<30:28, 3.17s/it] Training 1/2 epoch (loss 0.5469): 31%|β–ˆβ–ˆβ–ˆβ– | 263/840 [16:34<30:28, 3.17s/it] Training 1/2 epoch (loss 0.5469): 31%|β–ˆβ–ˆβ–ˆβ– | 264/840 [16:34<28:40, 2.99s/it] Training 1/2 epoch (loss 0.4570): 31%|β–ˆβ–ˆβ–ˆβ– | 264/840 [16:37<28:40, 2.99s/it] Training 1/2 epoch (loss 0.4570): 32%|β–ˆβ–ˆβ–ˆβ– | 265/840 [16:37<28:23, 2.96s/it] Training 1/2 epoch (loss 0.8672): 32%|β–ˆβ–ˆβ–ˆβ– | 265/840 [16:42<28:23, 2.96s/it] Training 1/2 epoch (loss 0.8672): 32%|β–ˆβ–ˆβ–ˆβ– | 266/840 [16:42<35:43, 3.73s/it] Training 1/2 epoch (loss 0.6367): 32%|β–ˆβ–ˆβ–ˆβ– | 266/840 [16:45<35:43, 3.73s/it] Training 1/2 epoch (loss 0.6367): 32%|β–ˆβ–ˆβ–ˆβ– | 267/840 [16:45<33:34, 3.52s/it] Training 1/2 epoch (loss 0.4766): 32%|β–ˆβ–ˆβ–ˆβ– | 267/840 [16:48<33:34, 3.52s/it] Training 1/2 epoch (loss 0.4766): 32%|β–ˆβ–ˆβ–ˆβ– | 268/840 [16:48<32:47, 3.44s/it] Training 1/2 epoch (loss 0.4961): 32%|β–ˆβ–ˆβ–ˆβ– | 268/840 [16:51<32:47, 3.44s/it] Training 1/2 epoch (loss 0.4961): 32%|β–ˆβ–ˆβ–ˆβ– | 269/840 [16:51<30:15, 3.18s/it] Training 1/2 epoch (loss 0.5000): 32%|β–ˆβ–ˆβ–ˆβ– | 269/840 [16:54<30:15, 3.18s/it] Training 1/2 epoch (loss 0.5000): 32%|β–ˆβ–ˆβ–ˆβ– | 270/840 [16:54<30:43, 3.23s/it] Training 1/2 epoch (loss 0.5352): 32%|β–ˆβ–ˆβ–ˆβ– | 270/840 [16:59<30:43, 3.23s/it] Training 1/2 epoch (loss 0.5352): 32%|β–ˆβ–ˆβ–ˆβ– | 271/840 [16:59<35:10, 3.71s/it] Training 1/2 epoch (loss 0.5156): 32%|β–ˆβ–ˆβ–ˆβ– | 271/840 [17:02<35:10, 3.71s/it] Training 1/2 epoch (loss 0.5156): 32%|β–ˆβ–ˆβ–ˆβ– | 272/840 [17:02<33:09, 3.50s/it] Training 1/2 epoch (loss 0.5000): 32%|β–ˆβ–ˆβ–ˆβ– | 272/840 [17:06<33:09, 3.50s/it] Training 1/2 epoch (loss 0.5000): 32%|β–ˆβ–ˆβ–ˆβ–Ž | 273/840 [17:06<35:11, 3.72s/it] Training 1/2 epoch (loss 0.5039): 32%|β–ˆβ–ˆβ–ˆβ–Ž | 273/840 [17:11<35:11, 3.72s/it] Training 1/2 epoch (loss 0.5039): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 274/840 [17:11<36:20, 3.85s/it] Training 1/2 epoch (loss 0.5625): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 274/840 [17:15<36:20, 3.85s/it] Training 1/2 epoch (loss 0.5625): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 275/840 [17:15<39:02, 4.15s/it] Training 1/2 epoch (loss 0.5156): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 275/840 [17:20<39:02, 4.15s/it] Training 1/2 epoch (loss 0.5156): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 276/840 [17:20<39:20, 4.18s/it] Training 1/2 epoch (loss 0.5977): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 276/840 [17:23<39:20, 4.18s/it] Training 1/2 epoch (loss 0.5977): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 277/840 [17:23<36:41, 3.91s/it] Training 1/2 epoch (loss 0.4336): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 277/840 [17:27<36:41, 3.91s/it] Training 1/2 epoch (loss 0.4336): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 278/840 [17:27<36:41, 3.92s/it] Training 1/2 epoch (loss 0.5586): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 278/840 [17:30<36:41, 3.92s/it] Training 1/2 epoch (loss 0.5586): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 279/840 [17:30<35:48, 3.83s/it] Training 1/2 epoch (loss 0.5273): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 279/840 [17:33<35:48, 3.83s/it] Training 1/2 epoch (loss 0.5273): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 280/840 [17:33<33:05, 3.55s/it] Training 1/2 epoch (loss 0.5781): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 280/840 [17:37<33:05, 3.55s/it] Training 1/2 epoch (loss 0.5781): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 281/840 [17:37<32:55, 3.53s/it] Training 1/2 epoch (loss 0.6250): 33%|β–ˆβ–ˆβ–ˆβ–Ž | 281/840 [17:40<32:55, 3.53s/it] Training 1/2 epoch (loss 0.6250): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 282/840 [17:40<31:44, 3.41s/it] Training 1/2 epoch (loss 0.5547): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 282/840 [17:44<31:44, 3.41s/it] Training 1/2 epoch (loss 0.5547): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 283/840 [17:44<33:14, 3.58s/it] Training 1/2 epoch (loss 0.5781): 34%|β–ˆβ–ˆβ–ˆβ–Ž | 283/840 [17:47<33:14, 3.58s/it] Training 1/2 epoch (loss 0.5781): 34%|β–ˆβ–ˆβ–ˆβ– | 284/840 [17:47<31:11, 3.37s/it] Training 1/2 epoch (loss 0.5430): 34%|β–ˆβ–ˆβ–ˆβ– | 284/840 [17:51<31:11, 3.37s/it] Training 1/2 epoch (loss 0.5430): 34%|β–ˆβ–ˆβ–ˆβ– | 285/840 [17:51<32:10, 3.48s/it] Training 1/2 epoch (loss 0.5273): 34%|β–ˆβ–ˆβ–ˆβ– | 285/840 [17:54<32:10, 3.48s/it] Training 1/2 epoch (loss 0.5273): 34%|β–ˆβ–ˆβ–ˆβ– | 286/840 [17:54<32:11, 3.49s/it] Training 1/2 epoch (loss 0.4414): 34%|β–ˆβ–ˆβ–ˆβ– | 286/840 [17:58<32:11, 3.49s/it] Training 1/2 epoch (loss 0.4414): 34%|β–ˆβ–ˆβ–ˆβ– | 287/840 [17:58<32:22, 3.51s/it] Training 1/2 epoch (loss 0.5000): 34%|β–ˆβ–ˆβ–ˆβ– | 287/840 [18:01<32:22, 3.51s/it] Training 1/2 epoch (loss 0.5000): 34%|β–ˆβ–ˆβ–ˆβ– | 288/840 [18:01<30:29, 3.31s/it] Training 1/2 epoch (loss 0.5586): 34%|β–ˆβ–ˆβ–ˆβ– | 288/840 [18:04<30:29, 3.31s/it] Training 1/2 epoch (loss 0.5586): 34%|β–ˆβ–ˆβ–ˆβ– | 289/840 [18:04<29:46, 3.24s/it] Training 1/2 epoch (loss 0.5898): 34%|β–ˆβ–ˆβ–ˆβ– | 289/840 [18:07<29:46, 3.24s/it] Training 1/2 epoch (loss 0.5898): 35%|β–ˆβ–ˆβ–ˆβ– | 290/840 [18:07<30:50, 3.36s/it] Training 1/2 epoch (loss 0.5195): 35%|β–ˆβ–ˆβ–ˆβ– | 290/840 [18:12<30:50, 3.36s/it] Training 1/2 epoch (loss 0.5195): 35%|β–ˆβ–ˆβ–ˆβ– | 291/840 [18:12<33:19, 3.64s/it] Training 1/2 epoch (loss 0.4844): 35%|β–ˆβ–ˆβ–ˆβ– | 291/840 [18:16<33:19, 3.64s/it] Training 1/2 epoch (loss 0.4844): 35%|β–ˆβ–ˆβ–ˆβ– | 292/840 [18:16<36:47, 4.03s/it] Training 1/2 epoch (loss 0.6523): 35%|β–ˆβ–ˆβ–ˆβ– | 292/840 [18:21<36:47, 4.03s/it] Training 1/2 epoch (loss 0.6523): 35%|β–ˆβ–ˆβ–ˆβ– | 293/840 [18:21<37:39, 4.13s/it] Training 1/2 epoch (loss 0.5547): 35%|β–ˆβ–ˆβ–ˆβ– | 293/840 [18:24<37:39, 4.13s/it] Training 1/2 epoch (loss 0.5547): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 294/840 [18:24<34:43, 3.82s/it] Training 1/2 epoch (loss 0.5625): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 294/840 [18:27<34:43, 3.82s/it] Training 1/2 epoch (loss 0.5625): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 295/840 [18:27<31:44, 3.49s/it] Training 1/2 epoch (loss 0.5117): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 295/840 [18:30<31:44, 3.49s/it] Training 1/2 epoch (loss 0.5117): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 296/840 [18:30<31:47, 3.51s/it] Training 1/2 epoch (loss 0.5625): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 296/840 [18:34<31:47, 3.51s/it] Training 1/2 epoch (loss 0.5625): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 297/840 [18:34<32:10, 3.56s/it] Training 1/2 epoch (loss 0.4434): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 297/840 [18:38<32:10, 3.56s/it] Training 1/2 epoch (loss 0.4434): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 298/840 [18:38<32:44, 3.63s/it] Training 1/2 epoch (loss 0.5547): 35%|β–ˆβ–ˆβ–ˆβ–Œ | 298/840 [18:41<32:44, 3.63s/it] Training 1/2 epoch (loss 0.5547): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 299/840 [18:41<32:18, 3.58s/it] Training 1/2 epoch (loss 0.5938): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 299/840 [18:44<32:18, 3.58s/it] Training 1/2 epoch (loss 0.5938): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 300/840 [18:44<30:37, 3.40s/it] Training 1/2 epoch (loss 0.5742): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 300/840 [18:49<30:37, 3.40s/it] Training 1/2 epoch (loss 0.5742): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 301/840 [18:49<34:22, 3.83s/it] Training 1/2 epoch (loss 0.5820): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 301/840 [18:53<34:22, 3.83s/it] Training 1/2 epoch (loss 0.5820): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 302/840 [18:53<34:19, 3.83s/it] Training 1/2 epoch (loss 0.4883): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 302/840 [18:58<34:19, 3.83s/it] Training 1/2 epoch (loss 0.4883): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 303/840 [18:58<38:48, 4.34s/it] Training 1/2 epoch (loss 0.6367): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 303/840 [19:02<38:48, 4.34s/it] Training 1/2 epoch (loss 0.6367): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 304/840 [19:02<36:01, 4.03s/it] Training 1/2 epoch (loss 0.5156): 36%|β–ˆβ–ˆβ–ˆβ–Œ | 304/840 [19:05<36:01, 4.03s/it] Training 1/2 epoch (loss 0.5156): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 305/840 [19:05<34:55, 3.92s/it] Training 1/2 epoch (loss 0.6445): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 305/840 [19:10<34:55, 3.92s/it] Training 1/2 epoch (loss 0.6445): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 306/840 [19:10<37:38, 4.23s/it] Training 1/2 epoch (loss 0.4141): 36%|β–ˆβ–ˆβ–ˆβ–‹ | 306/840 [19:14<37:38, 4.23s/it] Training 1/2 epoch (loss 0.4141): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 307/840 [19:14<35:52, 4.04s/it] Training 1/2 epoch (loss 0.4805): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 307/840 [19:17<35:52, 4.04s/it] Training 1/2 epoch (loss 0.4805): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 308/840 [19:17<33:45, 3.81s/it] Training 1/2 epoch (loss 0.5547): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 308/840 [19:21<33:45, 3.81s/it] Training 1/2 epoch (loss 0.5547): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 309/840 [19:21<33:24, 3.77s/it] Training 1/2 epoch (loss 0.5039): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 309/840 [19:24<33:24, 3.77s/it] Training 1/2 epoch (loss 0.5039): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 310/840 [19:24<31:46, 3.60s/it] Training 1/2 epoch (loss 0.5312): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 310/840 [19:28<31:46, 3.60s/it] Training 1/2 epoch (loss 0.5312): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 311/840 [19:28<31:44, 3.60s/it] Training 1/2 epoch (loss 0.4941): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 311/840 [19:31<31:44, 3.60s/it] Training 1/2 epoch (loss 0.4941): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 312/840 [19:31<31:02, 3.53s/it] Training 1/2 epoch (loss 0.5547): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 312/840 [19:33<31:02, 3.53s/it] Training 1/2 epoch (loss 0.5547): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 313/840 [19:33<28:15, 3.22s/it] Training 1/2 epoch (loss 0.3809): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 313/840 [19:38<28:15, 3.22s/it] Training 1/2 epoch (loss 0.3809): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 314/840 [19:38<30:37, 3.49s/it] Training 1/2 epoch (loss 0.5078): 37%|β–ˆβ–ˆβ–ˆβ–‹ | 314/840 [19:42<30:37, 3.49s/it] Training 1/2 epoch (loss 0.5078): 38%|β–ˆβ–ˆβ–ˆβ–Š | 315/840 [19:42<32:55, 3.76s/it] Training 1/2 epoch (loss 0.5469): 38%|β–ˆβ–ˆβ–ˆβ–Š | 315/840 [19:45<32:55, 3.76s/it] Training 1/2 epoch (loss 0.5469): 38%|β–ˆβ–ˆβ–ˆβ–Š | 316/840 [19:45<32:19, 3.70s/it] Training 1/2 epoch (loss 0.5742): 38%|β–ˆβ–ˆβ–ˆβ–Š | 316/840 [19:50<32:19, 3.70s/it] Training 1/2 epoch (loss 0.5742): 38%|β–ˆβ–ˆβ–ˆβ–Š | 317/840 [19:50<33:53, 3.89s/it] Training 1/2 epoch (loss 0.6328): 38%|β–ˆβ–ˆβ–ˆβ–Š | 317/840 [19:53<33:53, 3.89s/it] Training 1/2 epoch (loss 0.6328): 38%|β–ˆβ–ˆβ–ˆβ–Š | 318/840 [19:53<31:55, 3.67s/it] Training 1/2 epoch (loss 0.4883): 38%|β–ˆβ–ˆβ–ˆβ–Š | 318/840 [19:58<31:55, 3.67s/it] Training 1/2 epoch (loss 0.4883): 38%|β–ˆβ–ˆβ–ˆβ–Š | 319/840 [19:58<36:24, 4.19s/it] Training 1/2 epoch (loss 0.5000): 38%|β–ˆβ–ˆβ–ˆβ–Š | 319/840 [20:01<36:24, 4.19s/it] Training 1/2 epoch (loss 0.5000): 38%|β–ˆβ–ˆβ–ˆβ–Š | 320/840 [20:01<33:04, 3.82s/it] Training 1/2 epoch (loss 0.5078): 38%|β–ˆβ–ˆβ–ˆβ–Š | 320/840 [20:07<33:04, 3.82s/it] Training 1/2 epoch (loss 0.5078): 38%|β–ˆβ–ˆβ–ˆβ–Š | 321/840 [20:07<37:00, 4.28s/it] Training 1/2 epoch (loss 0.4727): 38%|β–ˆβ–ˆβ–ˆβ–Š | 321/840 [20:11<37:00, 4.28s/it] Training 1/2 epoch (loss 0.4727): 38%|β–ˆβ–ˆβ–ˆβ–Š | 322/840 [20:11<37:27, 4.34s/it] Training 1/2 epoch (loss 0.5898): 38%|β–ˆβ–ˆβ–ˆβ–Š | 322/840 [20:16<37:27, 4.34s/it] Training 1/2 epoch (loss 0.5898): 38%|β–ˆβ–ˆβ–ˆβ–Š | 323/840 [20:16<38:02, 4.41s/it] Training 1/2 epoch (loss 0.5977): 38%|β–ˆβ–ˆβ–ˆβ–Š | 323/840 [20:19<38:02, 4.41s/it] Training 1/2 epoch (loss 0.5977): 39%|β–ˆβ–ˆβ–ˆβ–Š | 324/840 [20:19<35:35, 4.14s/it] Training 1/2 epoch (loss 0.5742): 39%|β–ˆβ–ˆβ–ˆβ–Š | 324/840 [20:22<35:35, 4.14s/it] Training 1/2 epoch (loss 0.5742): 39%|β–ˆβ–ˆβ–ˆβ–Š | 325/840 [20:22<32:21, 3.77s/it] Training 1/2 epoch (loss 0.5938): 39%|β–ˆβ–ˆβ–ˆβ–Š | 325/840 [20:27<32:21, 3.77s/it] Training 1/2 epoch (loss 0.5938): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 326/840 [20:27<34:49, 4.06s/it] Training 1/2 epoch (loss 0.4590): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 326/840 [20:32<34:49, 4.06s/it] Training 1/2 epoch (loss 0.4590): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 327/840 [20:32<38:13, 4.47s/it] Training 1/2 epoch (loss 0.4297): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 327/840 [20:36<38:13, 4.47s/it] Training 1/2 epoch (loss 0.4297): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 328/840 [20:36<36:26, 4.27s/it] Training 1/2 epoch (loss 0.4004): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 328/840 [20:40<36:26, 4.27s/it] Training 1/2 epoch (loss 0.4004): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 329/840 [20:40<34:04, 4.00s/it] Training 1/2 epoch (loss 0.4961): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 329/840 [20:43<34:04, 4.00s/it] Training 1/2 epoch (loss 0.4961): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 330/840 [20:43<31:47, 3.74s/it] Training 1/2 epoch (loss 0.4863): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 330/840 [20:46<31:47, 3.74s/it] Training 1/2 epoch (loss 0.4863): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 331/840 [20:46<30:15, 3.57s/it] Training 1/2 epoch (loss 0.5859): 39%|β–ˆβ–ˆβ–ˆβ–‰ | 331/840 [20:48<30:15, 3.57s/it] Training 1/2 epoch (loss 0.5859): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 332/840 [20:48<27:50, 3.29s/it] Training 1/2 epoch (loss 0.5859): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 332/840 [20:54<27:50, 3.29s/it] Training 1/2 epoch (loss 0.5859): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 333/840 [20:54<33:17, 3.94s/it] Training 1/2 epoch (loss 0.6484): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 333/840 [20:59<33:17, 3.94s/it] Training 1/2 epoch (loss 0.6484): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 334/840 [20:59<37:23, 4.43s/it] Training 1/2 epoch (loss 0.5078): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 334/840 [21:05<37:23, 4.43s/it] Training 1/2 epoch (loss 0.5078): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 335/840 [21:05<40:00, 4.75s/it] Training 1/2 epoch (loss 0.4844): 40%|β–ˆβ–ˆβ–ˆβ–‰ | 335/840 [21:09<40:00, 4.75s/it] Training 1/2 epoch (loss 0.4844): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 336/840 [21:09<36:58, 4.40s/it] Training 1/2 epoch (loss 0.6250): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 336/840 [21:12<36:58, 4.40s/it] Training 1/2 epoch (loss 0.6250): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 337/840 [21:12<35:15, 4.21s/it] Training 1/2 epoch (loss 0.5781): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 337/840 [21:15<35:15, 4.21s/it] Training 1/2 epoch (loss 0.5781): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 338/840 [21:15<32:30, 3.89s/it] Training 1/2 epoch (loss 0.4727): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 338/840 [21:19<32:30, 3.89s/it] Training 1/2 epoch (loss 0.4727): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 339/840 [21:19<31:08, 3.73s/it] Training 1/2 epoch (loss 0.5820): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 339/840 [21:23<31:08, 3.73s/it] Training 1/2 epoch (loss 0.5820): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 340/840 [21:23<31:56, 3.83s/it] Training 1/2 epoch (loss 0.4844): 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 340/840 [21:26<31:56, 3.83s/it] Training 1/2 epoch (loss 0.4844): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 341/840 [21:26<30:07, 3.62s/it] Training 1/2 epoch (loss 0.5938): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 341/840 [21:30<30:07, 3.62s/it] Training 1/2 epoch (loss 0.5938): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 342/840 [21:30<30:40, 3.69s/it] Training 1/2 epoch (loss 0.6602): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 342/840 [21:33<30:40, 3.69s/it] Training 1/2 epoch (loss 0.6602): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 343/840 [21:33<28:57, 3.50s/it] Training 1/2 epoch (loss 0.5352): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 343/840 [21:38<28:57, 3.50s/it] Training 1/2 epoch (loss 0.5352): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 344/840 [21:38<31:36, 3.82s/it] Training 1/2 epoch (loss 0.6094): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 344/840 [21:40<31:36, 3.82s/it] Training 1/2 epoch (loss 0.6094): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 345/840 [21:40<28:35, 3.47s/it] Training 1/2 epoch (loss 0.5156): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 345/840 [21:43<28:35, 3.47s/it] Training 1/2 epoch (loss 0.5156): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 346/840 [21:43<26:27, 3.21s/it] Training 1/2 epoch (loss 0.5039): 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 346/840 [21:45<26:27, 3.21s/it] Training 1/2 epoch (loss 0.5039): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 347/840 [21:45<25:09, 3.06s/it] Training 1/2 epoch (loss 0.4668): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 347/840 [21:49<25:09, 3.06s/it] Training 1/2 epoch (loss 0.4668): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 348/840 [21:49<25:40, 3.13s/it] Training 1/2 epoch (loss 0.5781): 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 348/840 [21:53<25:40, 3.13s/it] Training 1/2 epoch (loss 0.5781): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 349/840 [21:53<27:48, 3.40s/it] Training 1/2 epoch (loss 0.4492): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 349/840 [21:56<27:48, 3.40s/it] Training 1/2 epoch (loss 0.4492): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 350/840 [21:56<26:15, 3.22s/it] Training 1/2 epoch (loss 0.4473): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 350/840 [22:01<26:15, 3.22s/it] Training 1/2 epoch (loss 0.4473): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 351/840 [22:01<31:38, 3.88s/it] Training 1/2 epoch (loss 0.4844): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 351/840 [22:04<31:38, 3.88s/it] Training 1/2 epoch (loss 0.4844): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 352/840 [22:04<29:36, 3.64s/it] Training 1/2 epoch (loss 0.4922): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 352/840 [22:08<29:36, 3.64s/it] Training 1/2 epoch (loss 0.4922): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 353/840 [22:08<29:36, 3.65s/it] Training 1/2 epoch (loss 0.5430): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 353/840 [22:11<29:36, 3.65s/it] Training 1/2 epoch (loss 0.5430): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 354/840 [22:11<29:15, 3.61s/it] Training 1/2 epoch (loss 0.5469): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 354/840 [22:14<29:15, 3.61s/it] Training 1/2 epoch (loss 0.5469): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 355/840 [22:14<28:07, 3.48s/it] Training 1/2 epoch (loss 0.5039): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 355/840 [22:18<28:07, 3.48s/it] Training 1/2 epoch (loss 0.5039): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 356/840 [22:18<29:20, 3.64s/it] Training 1/2 epoch (loss 0.4473): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 356/840 [22:23<29:20, 3.64s/it] Training 1/2 epoch (loss 0.4473): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 357/840 [22:23<32:16, 4.01s/it] Training 1/2 epoch (loss 0.5156): 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 357/840 [22:28<32:16, 4.01s/it] Training 1/2 epoch (loss 0.5156): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 358/840 [22:28<33:35, 4.18s/it] Training 1/2 epoch (loss 0.4766): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 358/840 [22:33<33:35, 4.18s/it] Training 1/2 epoch (loss 0.4766): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 359/840 [22:33<36:34, 4.56s/it] Training 1/2 epoch (loss 0.5312): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 359/840 [22:37<36:34, 4.56s/it] Training 1/2 epoch (loss 0.5312): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 360/840 [22:37<35:15, 4.41s/it] Training 1/2 epoch (loss 0.4395): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 360/840 [22:41<35:15, 4.41s/it] Training 1/2 epoch (loss 0.4395): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 361/840 [22:41<34:10, 4.28s/it] Training 1/2 epoch (loss 0.4766): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 361/840 [22:44<34:10, 4.28s/it] Training 1/2 epoch (loss 0.4766): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 362/840 [22:44<30:47, 3.86s/it] Training 1/2 epoch (loss 0.4785): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 362/840 [22:48<30:47, 3.86s/it] Training 1/2 epoch (loss 0.4785): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 363/840 [22:48<29:54, 3.76s/it] Training 1/2 epoch (loss 0.4141): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 363/840 [22:53<29:54, 3.76s/it] Training 1/2 epoch (loss 0.4141): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 364/840 [22:53<33:02, 4.17s/it] Training 1/2 epoch (loss 0.5547): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 364/840 [22:57<33:02, 4.17s/it] Training 1/2 epoch (loss 0.5547): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 365/840 [22:57<31:44, 4.01s/it] Training 1/2 epoch (loss 0.5703): 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 365/840 [23:02<31:44, 4.01s/it] Training 1/2 epoch (loss 0.5703): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 366/840 [23:02<35:11, 4.45s/it] Training 1/2 epoch (loss 0.3711): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 366/840 [23:06<35:11, 4.45s/it] Training 1/2 epoch (loss 0.3711): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 367/840 [23:06<32:59, 4.18s/it] Training 1/2 epoch (loss 0.5078): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 367/840 [23:10<32:59, 4.18s/it] Training 1/2 epoch (loss 0.5078): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 368/840 [23:10<33:55, 4.31s/it] Training 1/2 epoch (loss 0.6055): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 368/840 [23:13<33:55, 4.31s/it] Training 1/2 epoch (loss 0.6055): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 369/840 [23:13<31:07, 3.97s/it] Training 1/2 epoch (loss 0.7266): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 369/840 [23:19<31:07, 3.97s/it] Training 1/2 epoch (loss 0.7266): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 370/840 [23:19<34:43, 4.43s/it] Training 1/2 epoch (loss 0.6250): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 370/840 [23:24<34:43, 4.43s/it] Training 1/2 epoch (loss 0.6250): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 371/840 [23:24<37:04, 4.74s/it] Training 1/2 epoch (loss 0.3438): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 371/840 [23:30<37:04, 4.74s/it] Training 1/2 epoch (loss 0.3438): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 372/840 [23:30<38:45, 4.97s/it] Training 1/2 epoch (loss 0.4219): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 372/840 [23:33<38:45, 4.97s/it] Training 1/2 epoch (loss 0.4219): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 373/840 [23:33<35:04, 4.51s/it] Training 1/2 epoch (loss 0.4883): 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 373/840 [23:37<35:04, 4.51s/it] Training 1/2 epoch (loss 0.4883): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 374/840 [23:37<32:47, 4.22s/it] Training 1/2 epoch (loss 0.5078): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 374/840 [23:41<32:47, 4.22s/it] Training 1/2 epoch (loss 0.5078): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 375/840 [23:41<33:26, 4.32s/it] Training 1/2 epoch (loss 0.5469): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 375/840 [23:46<33:26, 4.32s/it] Training 1/2 epoch (loss 0.5469): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 376/840 [23:46<34:40, 4.48s/it] Training 1/2 epoch (loss 0.4297): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 376/840 [23:50<34:40, 4.48s/it] Training 1/2 epoch (loss 0.4297): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 377/840 [23:50<33:26, 4.33s/it] Training 1/2 epoch (loss 0.4062): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 377/840 [23:54<33:26, 4.33s/it] Training 1/2 epoch (loss 0.4062): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 378/840 [23:54<32:04, 4.17s/it] Training 1/2 epoch (loss 0.4961): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 378/840 [23:57<32:04, 4.17s/it] Training 1/2 epoch (loss 0.4961): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 379/840 [23:57<28:38, 3.73s/it] Training 1/2 epoch (loss 0.4512): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 379/840 [24:02<28:38, 3.73s/it] Training 1/2 epoch (loss 0.4512): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 380/840 [24:02<31:06, 4.06s/it] Training 1/2 epoch (loss 0.5898): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 380/840 [24:04<31:06, 4.06s/it] Training 1/2 epoch (loss 0.5898): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 381/840 [24:04<28:12, 3.69s/it] Training 1/2 epoch (loss 0.4219): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 381/840 [24:09<28:12, 3.69s/it] Training 1/2 epoch (loss 0.4219): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 382/840 [24:09<30:58, 4.06s/it] Training 1/2 epoch (loss 0.5312): 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 382/840 [24:12<30:58, 4.06s/it] Training 1/2 epoch (loss 0.5312): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 383/840 [24:12<28:55, 3.80s/it] Training 1/2 epoch (loss 0.4570): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 383/840 [24:16<28:55, 3.80s/it] Training 1/2 epoch (loss 0.4570): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 384/840 [24:16<28:24, 3.74s/it] Training 1/2 epoch (loss 0.4922): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 384/840 [24:19<28:24, 3.74s/it] Training 1/2 epoch (loss 0.4922): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 385/840 [24:19<26:48, 3.54s/it] Training 1/2 epoch (loss 0.5703): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 385/840 [24:22<26:48, 3.54s/it] Training 1/2 epoch (loss 0.5703): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 386/840 [24:22<25:39, 3.39s/it] Training 1/2 epoch (loss 0.5859): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 386/840 [24:25<25:39, 3.39s/it] Training 1/2 epoch (loss 0.5859): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 387/840 [24:25<24:14, 3.21s/it] Training 1/2 epoch (loss 0.5078): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 387/840 [24:29<24:14, 3.21s/it] Training 1/2 epoch (loss 0.5078): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 388/840 [24:29<26:35, 3.53s/it] Training 1/2 epoch (loss 0.4219): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 388/840 [24:32<26:35, 3.53s/it] Training 1/2 epoch (loss 0.4219): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 389/840 [24:32<24:53, 3.31s/it] Training 1/2 epoch (loss 0.3906): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 389/840 [24:35<24:53, 3.31s/it] Training 1/2 epoch (loss 0.3906): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 390/840 [24:35<24:15, 3.23s/it] Training 1/2 epoch (loss 0.5312): 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 390/840 [24:38<24:15, 3.23s/it] Training 1/2 epoch (loss 0.5312): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 391/840 [24:38<24:00, 3.21s/it] Training 1/2 epoch (loss 0.4570): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 391/840 [24:41<24:00, 3.21s/it] Training 1/2 epoch (loss 0.4570): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 392/840 [24:41<22:15, 2.98s/it] Training 1/2 epoch (loss 0.5703): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 392/840 [24:44<22:15, 2.98s/it] Training 1/2 epoch (loss 0.5703): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 393/840 [24:44<23:22, 3.14s/it] Training 1/2 epoch (loss 0.5117): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 393/840 [24:47<23:22, 3.14s/it] Training 1/2 epoch (loss 0.5117): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 394/840 [24:47<23:18, 3.14s/it] Training 1/2 epoch (loss 0.4570): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 394/840 [24:50<23:18, 3.14s/it] Training 1/2 epoch (loss 0.4570): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 395/840 [24:50<22:05, 2.98s/it] Training 1/2 epoch (loss 0.4512): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 395/840 [24:53<22:05, 2.98s/it] Training 1/2 epoch (loss 0.4512): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 396/840 [24:53<22:55, 3.10s/it] Training 1/2 epoch (loss 0.4414): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 396/840 [24:57<22:55, 3.10s/it] Training 1/2 epoch (loss 0.4414): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 397/840 [24:57<24:40, 3.34s/it] Training 1/2 epoch (loss 0.4180): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 397/840 [25:01<24:40, 3.34s/it] Training 1/2 epoch (loss 0.4180): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 398/840 [25:01<24:47, 3.37s/it] Training 1/2 epoch (loss 0.6211): 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 398/840 [25:06<24:47, 3.37s/it] Training 1/2 epoch (loss 0.6211): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 399/840 [25:06<29:22, 4.00s/it] Training 1/2 epoch (loss 0.5000): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 399/840 [25:12<29:22, 4.00s/it] Training 1/2 epoch (loss 0.5000): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 400/840 [25:12<32:51, 4.48s/it] Training 1/2 epoch (loss 0.4629): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 400/840 [25:16<32:51, 4.48s/it] Training 1/2 epoch (loss 0.4629): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 401/840 [25:16<31:37, 4.32s/it] Training 1/2 epoch (loss 0.4375): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 401/840 [25:21<31:37, 4.32s/it] Training 1/2 epoch (loss 0.4375): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 402/840 [25:21<34:13, 4.69s/it] Training 1/2 epoch (loss 0.4863): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 402/840 [25:27<34:13, 4.69s/it] Training 1/2 epoch (loss 0.4863): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 403/840 [25:27<35:50, 4.92s/it] Training 1/2 epoch (loss 0.4688): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 403/840 [25:30<35:50, 4.92s/it] Training 1/2 epoch (loss 0.4688): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 404/840 [25:30<31:37, 4.35s/it] Training 1/2 epoch (loss 0.5898): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 404/840 [25:32<31:37, 4.35s/it] Training 1/2 epoch (loss 0.5898): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 405/840 [25:32<27:29, 3.79s/it] Training 1/2 epoch (loss 0.5586): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 405/840 [25:37<27:29, 3.79s/it] Training 1/2 epoch (loss 0.5586): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 406/840 [25:37<29:17, 4.05s/it] Training 1/2 epoch (loss 0.5078): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 406/840 [25:42<29:17, 4.05s/it] Training 1/2 epoch (loss 0.5078): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 407/840 [25:42<30:35, 4.24s/it] Training 1/2 epoch (loss 0.5000): 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 407/840 [25:46<30:35, 4.24s/it] Training 1/2 epoch (loss 0.5000): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 408/840 [25:46<31:43, 4.41s/it] Training 1/2 epoch (loss 0.4844): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 408/840 [25:49<31:43, 4.41s/it] Training 1/2 epoch (loss 0.4844): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 409/840 [25:49<27:51, 3.88s/it] Training 1/2 epoch (loss 0.4004): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 409/840 [25:55<27:51, 3.88s/it] Training 1/2 epoch (loss 0.4004): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 410/840 [25:55<31:27, 4.39s/it] Training 1/2 epoch (loss 0.5234): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 410/840 [25:58<31:27, 4.39s/it] Training 1/2 epoch (loss 0.5234): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 411/840 [25:58<28:14, 3.95s/it] Training 1/2 epoch (loss 0.4961): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 411/840 [26:01<28:14, 3.95s/it] Training 1/2 epoch (loss 0.4961): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 412/840 [26:01<26:42, 3.74s/it] Training 1/2 epoch (loss 0.4492): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 412/840 [26:04<26:42, 3.74s/it] Training 1/2 epoch (loss 0.4492): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 413/840 [26:04<25:13, 3.54s/it] Training 1/2 epoch (loss 0.4668): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 413/840 [26:08<25:13, 3.54s/it] Training 1/2 epoch (loss 0.4668): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 414/840 [26:08<26:55, 3.79s/it] Training 1/2 epoch (loss 0.4316): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 414/840 [26:12<26:55, 3.79s/it] Training 1/2 epoch (loss 0.4316): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 415/840 [26:12<27:47, 3.92s/it] Training 1/2 epoch (loss 0.5195): 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 415/840 [26:17<27:47, 3.92s/it] Training 1/2 epoch (loss 0.5195): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 416/840 [26:17<29:24, 4.16s/it] Training 1/2 epoch (loss 0.4922): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 416/840 [26:21<29:24, 4.16s/it] Training 1/2 epoch (loss 0.4922): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 417/840 [26:21<27:36, 3.92s/it] Training 1/2 epoch (loss 0.4082): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 417/840 [26:24<27:36, 3.92s/it] Training 1/2 epoch (loss 0.4082): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 418/840 [26:24<27:16, 3.88s/it] Training 1/2 epoch (loss 0.5820): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 418/840 [26:27<27:16, 3.88s/it] Training 1/2 epoch (loss 0.5820): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 419/840 [26:27<25:34, 3.64s/it] Training 1/2 epoch (loss 0.4297): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 419/840 [26:31<25:34, 3.64s/it] Training 1/2 epoch (loss 0.4297): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 420/840 [26:31<25:35, 3.66s/it] Training 2/2 epoch (loss 0.5664): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 420/840 [26:34<25:35, 3.66s/it] Training 2/2 epoch (loss 0.5664): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 421/840 [26:34<23:03, 3.30s/it] Training 2/2 epoch (loss 0.4414): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 421/840 [26:37<23:03, 3.30s/it] Training 2/2 epoch (loss 0.4414): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 422/840 [26:37<23:57, 3.44s/it] Training 2/2 epoch (loss 0.4531): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 422/840 [26:40<23:57, 3.44s/it] Training 2/2 epoch (loss 0.4531): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 423/840 [26:40<22:54, 3.30s/it] Training 2/2 epoch (loss 0.4219): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 423/840 [26:43<22:54, 3.30s/it] Training 2/2 epoch (loss 0.4219): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 424/840 [26:43<22:08, 3.19s/it] Training 2/2 epoch (loss 0.4805): 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 424/840 [26:48<22:08, 3.19s/it] Training 2/2 epoch (loss 0.4805): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 425/840 [26:48<24:47, 3.58s/it] Training 2/2 epoch (loss 0.4824): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 425/840 [26:51<24:47, 3.58s/it] Training 2/2 epoch (loss 0.4824): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 426/840 [26:51<24:30, 3.55s/it] Training 2/2 epoch (loss 0.5469): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 426/840 [26:55<24:30, 3.55s/it] Training 2/2 epoch (loss 0.5469): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 427/840 [26:55<25:04, 3.64s/it] Training 2/2 epoch (loss 0.4316): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 427/840 [27:01<25:04, 3.64s/it] Training 2/2 epoch (loss 0.4316): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 428/840 [27:01<28:56, 4.22s/it] Training 2/2 epoch (loss 0.5508): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 428/840 [27:05<28:56, 4.22s/it] Training 2/2 epoch (loss 0.5508): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 429/840 [27:05<28:23, 4.14s/it] Training 2/2 epoch (loss 0.4492): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 429/840 [27:08<28:23, 4.14s/it] Training 2/2 epoch (loss 0.4492): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 430/840 [27:08<26:10, 3.83s/it] Training 2/2 epoch (loss 0.6016): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 430/840 [27:10<26:10, 3.83s/it] Training 2/2 epoch (loss 0.6016): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 431/840 [27:10<23:31, 3.45s/it] Training 2/2 epoch (loss 0.4941): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 431/840 [27:14<23:31, 3.45s/it] Training 2/2 epoch (loss 0.4941): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 432/840 [27:14<24:52, 3.66s/it] Training 2/2 epoch (loss 0.5391): 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 432/840 [27:17<24:52, 3.66s/it] Training 2/2 epoch (loss 0.5391): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 433/840 [27:17<22:42, 3.35s/it] Training 2/2 epoch (loss 0.5312): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 433/840 [27:21<22:42, 3.35s/it] Training 2/2 epoch (loss 0.5312): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 434/840 [27:21<23:05, 3.41s/it] Training 2/2 epoch (loss 0.3555): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 434/840 [27:23<23:05, 3.41s/it] Training 2/2 epoch (loss 0.3555): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 435/840 [27:23<22:00, 3.26s/it] Training 2/2 epoch (loss 0.4941): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 435/840 [27:27<22:00, 3.26s/it] Training 2/2 epoch (loss 0.4941): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 436/840 [27:27<22:26, 3.33s/it] Training 2/2 epoch (loss 0.6172): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 436/840 [27:30<22:26, 3.33s/it] Training 2/2 epoch (loss 0.6172): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 437/840 [27:30<21:27, 3.20s/it] Training 2/2 epoch (loss 0.4707): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 437/840 [27:33<21:27, 3.20s/it] Training 2/2 epoch (loss 0.4707): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 438/840 [27:33<21:37, 3.23s/it] Training 2/2 epoch (loss 0.4922): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 438/840 [27:37<21:37, 3.23s/it] Training 2/2 epoch (loss 0.4922): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 439/840 [27:37<23:33, 3.53s/it] Training 2/2 epoch (loss 0.5039): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 439/840 [27:41<23:33, 3.53s/it] Training 2/2 epoch (loss 0.5039): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 440/840 [27:41<24:07, 3.62s/it] Training 2/2 epoch (loss 0.5352): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 440/840 [27:45<24:07, 3.62s/it] Training 2/2 epoch (loss 0.5352): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 441/840 [27:45<24:06, 3.62s/it] Training 2/2 epoch (loss 0.5273): 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 441/840 [27:48<24:06, 3.62s/it] Training 2/2 epoch (loss 0.5273): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 442/840 [27:48<22:18, 3.36s/it] Training 2/2 epoch (loss 0.4746): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 442/840 [27:50<22:18, 3.36s/it] Training 2/2 epoch (loss 0.4746): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 443/840 [27:50<20:57, 3.17s/it] Training 2/2 epoch (loss 0.4648): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 443/840 [27:54<20:57, 3.17s/it] Training 2/2 epoch (loss 0.4648): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 444/840 [27:54<22:08, 3.36s/it] Training 2/2 epoch (loss 0.4609): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 444/840 [27:59<22:08, 3.36s/it] Training 2/2 epoch (loss 0.4609): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 445/840 [27:59<24:50, 3.77s/it] Training 2/2 epoch (loss 0.4258): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 445/840 [28:04<24:50, 3.77s/it] Training 2/2 epoch (loss 0.4258): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 446/840 [28:04<27:11, 4.14s/it] Training 2/2 epoch (loss 0.3730): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 446/840 [28:07<27:11, 4.14s/it] Training 2/2 epoch (loss 0.3730): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 447/840 [28:07<25:43, 3.93s/it] Training 2/2 epoch (loss 0.3203): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 447/840 [28:11<25:43, 3.93s/it] Training 2/2 epoch (loss 0.3203): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 448/840 [28:11<24:31, 3.75s/it] Training 2/2 epoch (loss 0.3047): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 448/840 [28:13<24:31, 3.75s/it] Training 2/2 epoch (loss 0.3047): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 449/840 [28:13<22:02, 3.38s/it] Training 2/2 epoch (loss 0.3945): 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 449/840 [28:18<22:02, 3.38s/it] Training 2/2 epoch (loss 0.3945): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 450/840 [28:18<25:03, 3.86s/it] Training 2/2 epoch (loss 0.4473): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 450/840 [28:23<25:03, 3.86s/it] Training 2/2 epoch (loss 0.4473): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 451/840 [28:23<26:28, 4.08s/it] Training 2/2 epoch (loss 0.4277): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 451/840 [28:26<26:28, 4.08s/it] Training 2/2 epoch (loss 0.4277): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 452/840 [28:26<25:18, 3.91s/it] Training 2/2 epoch (loss 0.3359): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 452/840 [28:32<25:18, 3.91s/it] Training 2/2 epoch (loss 0.3359): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 453/840 [28:32<28:06, 4.36s/it] Training 2/2 epoch (loss 0.2773): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 453/840 [28:35<28:06, 4.36s/it] Training 2/2 epoch (loss 0.2773): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 454/840 [28:35<26:16, 4.08s/it] Training 2/2 epoch (loss 0.2598): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 454/840 [28:40<26:16, 4.08s/it] Training 2/2 epoch (loss 0.2598): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 455/840 [28:40<27:08, 4.23s/it] Training 2/2 epoch (loss 0.3320): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 455/840 [28:43<27:08, 4.23s/it] Training 2/2 epoch (loss 0.3320): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 456/840 [28:43<25:50, 4.04s/it] Training 2/2 epoch (loss 0.2656): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 456/840 [28:46<25:50, 4.04s/it] Training 2/2 epoch (loss 0.2656): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 457/840 [28:46<24:06, 3.78s/it] Training 2/2 epoch (loss 0.3066): 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 457/840 [28:50<24:06, 3.78s/it] Training 2/2 epoch (loss 0.3066): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 458/840 [28:50<22:49, 3.58s/it] Training 2/2 epoch (loss 0.2559): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 458/840 [28:54<22:49, 3.58s/it] Training 2/2 epoch (loss 0.2559): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 459/840 [28:54<24:53, 3.92s/it] Training 2/2 epoch (loss 0.1855): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 459/840 [28:59<24:53, 3.92s/it] Training 2/2 epoch (loss 0.1855): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 460/840 [28:59<27:09, 4.29s/it] Training 2/2 epoch (loss 0.2266): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 460/840 [29:04<27:09, 4.29s/it] Training 2/2 epoch (loss 0.2266): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 461/840 [29:04<27:15, 4.31s/it] Training 2/2 epoch (loss 0.1865): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 461/840 [29:07<27:15, 4.31s/it] Training 2/2 epoch (loss 0.1865): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 462/840 [29:07<25:49, 4.10s/it] Training 2/2 epoch (loss 0.2949): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 462/840 [29:12<25:49, 4.10s/it] Training 2/2 epoch (loss 0.2949): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 463/840 [29:12<26:23, 4.20s/it] Training 2/2 epoch (loss 0.4043): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 463/840 [29:16<26:23, 4.20s/it] Training 2/2 epoch (loss 0.4043): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 464/840 [29:16<26:14, 4.19s/it] Training 2/2 epoch (loss 0.2559): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 464/840 [29:19<26:14, 4.19s/it] Training 2/2 epoch (loss 0.2559): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 465/840 [29:19<23:08, 3.70s/it] Training 2/2 epoch (loss 0.3398): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 465/840 [29:24<23:08, 3.70s/it] Training 2/2 epoch (loss 0.3398): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 466/840 [29:24<26:36, 4.27s/it] Training 2/2 epoch (loss 0.3789): 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 466/840 [29:28<26:36, 4.27s/it] Training 2/2 epoch (loss 0.3789): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 467/840 [29:28<25:31, 4.11s/it] Training 2/2 epoch (loss 0.1738): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 467/840 [29:31<25:31, 4.11s/it] Training 2/2 epoch (loss 0.1738): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 468/840 [29:31<23:29, 3.79s/it] Training 2/2 epoch (loss 0.5859): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 468/840 [29:34<23:29, 3.79s/it] Training 2/2 epoch (loss 0.5859): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 469/840 [29:34<21:13, 3.43s/it] Training 2/2 epoch (loss 0.4824): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 469/840 [29:36<21:13, 3.43s/it] Training 2/2 epoch (loss 0.4824): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 470/840 [29:36<19:50, 3.22s/it] Training 2/2 epoch (loss 0.4863): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 470/840 [29:40<19:50, 3.22s/it] Training 2/2 epoch (loss 0.4863): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 471/840 [29:40<21:26, 3.49s/it] Training 2/2 epoch (loss 0.3633): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 471/840 [29:44<21:26, 3.49s/it] Training 2/2 epoch (loss 0.3633): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 472/840 [29:44<20:53, 3.41s/it] Training 2/2 epoch (loss 0.1660): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 472/840 [29:46<20:53, 3.41s/it] Training 2/2 epoch (loss 0.1660): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 473/840 [29:46<19:55, 3.26s/it] Training 2/2 epoch (loss 0.2852): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 473/840 [29:50<19:55, 3.26s/it] Training 2/2 epoch (loss 0.2852): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 474/840 [29:50<19:38, 3.22s/it] Training 2/2 epoch (loss 0.2109): 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 474/840 [29:53<19:38, 3.22s/it] Training 2/2 epoch (loss 0.2109): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 475/840 [29:53<19:59, 3.29s/it] Training 2/2 epoch (loss 0.3262): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 475/840 [29:56<19:59, 3.29s/it] Training 2/2 epoch (loss 0.3262): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 476/840 [29:56<19:39, 3.24s/it] Training 2/2 epoch (loss 0.1973): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 476/840 [29:59<19:39, 3.24s/it] Training 2/2 epoch (loss 0.1973): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 477/840 [29:59<19:10, 3.17s/it] Training 2/2 epoch (loss 0.1631): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 477/840 [30:03<19:10, 3.17s/it] Training 2/2 epoch (loss 0.1631): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 478/840 [30:03<19:47, 3.28s/it] Training 2/2 epoch (loss 0.2734): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 478/840 [30:07<19:47, 3.28s/it] Training 2/2 epoch (loss 0.2734): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 479/840 [30:07<22:11, 3.69s/it] Training 2/2 epoch (loss 0.2119): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 479/840 [30:13<22:11, 3.69s/it] Training 2/2 epoch (loss 0.2119): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 480/840 [30:13<25:15, 4.21s/it] Training 2/2 epoch (loss 0.1836): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 480/840 [30:16<25:15, 4.21s/it] Training 2/2 epoch (loss 0.1836): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 481/840 [30:16<23:22, 3.91s/it] Training 2/2 epoch (loss 0.3164): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 481/840 [30:20<23:22, 3.91s/it] Training 2/2 epoch (loss 0.3164): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 482/840 [30:20<23:15, 3.90s/it] Training 2/2 epoch (loss 0.3164): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 482/840 [30:25<23:15, 3.90s/it] Training 2/2 epoch (loss 0.3164): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 483/840 [30:25<25:53, 4.35s/it] Training 2/2 epoch (loss 0.3926): 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 483/840 [30:30<25:53, 4.35s/it] Training 2/2 epoch (loss 0.3926): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 484/840 [30:30<26:14, 4.42s/it] Training 2/2 epoch (loss 0.1250): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 484/840 [30:33<26:14, 4.42s/it] Training 2/2 epoch (loss 0.1250): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 485/840 [30:33<23:21, 3.95s/it] Training 2/2 epoch (loss 0.2363): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 485/840 [30:36<23:21, 3.95s/it] Training 2/2 epoch (loss 0.2363): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 486/840 [30:36<21:15, 3.60s/it] Training 2/2 epoch (loss 0.2656): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 486/840 [30:40<21:15, 3.60s/it] Training 2/2 epoch (loss 0.2656): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 487/840 [30:40<21:56, 3.73s/it] Training 2/2 epoch (loss 0.2578): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 487/840 [30:44<21:56, 3.73s/it] Training 2/2 epoch (loss 0.2578): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 488/840 [30:44<22:53, 3.90s/it] Training 2/2 epoch (loss 0.2305): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 488/840 [30:46<22:53, 3.90s/it] Training 2/2 epoch (loss 0.2305): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 489/840 [30:46<20:34, 3.52s/it] Training 2/2 epoch (loss 0.1226): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 489/840 [30:51<20:34, 3.52s/it] Training 2/2 epoch (loss 0.1226): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 490/840 [30:51<21:39, 3.71s/it] Training 2/2 epoch (loss 0.1279): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 490/840 [30:54<21:39, 3.71s/it] Training 2/2 epoch (loss 0.1279): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 491/840 [30:54<20:55, 3.60s/it] Training 2/2 epoch (loss 0.1582): 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 491/840 [31:00<20:55, 3.60s/it] Training 2/2 epoch (loss 0.1582): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 492/840 [31:00<24:20, 4.20s/it] Training 2/2 epoch (loss 0.1992): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 492/840 [31:05<24:20, 4.20s/it] Training 2/2 epoch (loss 0.1992): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 493/840 [31:05<26:23, 4.56s/it] Training 2/2 epoch (loss 0.2637): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 493/840 [31:10<26:23, 4.56s/it] Training 2/2 epoch (loss 0.2637): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 494/840 [31:10<26:30, 4.60s/it] Training 2/2 epoch (loss 0.2148): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 494/840 [31:15<26:30, 4.60s/it] Training 2/2 epoch (loss 0.2148): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 495/840 [31:15<27:55, 4.86s/it] Training 2/2 epoch (loss 0.2617): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 495/840 [31:18<27:55, 4.86s/it] Training 2/2 epoch (loss 0.2617): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 496/840 [31:18<24:04, 4.20s/it] Training 2/2 epoch (loss 0.1787): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 496/840 [31:21<24:04, 4.20s/it] Training 2/2 epoch (loss 0.1787): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 497/840 [31:21<22:12, 3.89s/it] Training 2/2 epoch (loss 0.1924): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 497/840 [31:24<22:12, 3.89s/it] Training 2/2 epoch (loss 0.1924): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 498/840 [31:24<20:14, 3.55s/it] Training 2/2 epoch (loss 0.1426): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 498/840 [31:28<20:14, 3.55s/it] Training 2/2 epoch (loss 0.1426): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 499/840 [31:28<21:42, 3.82s/it] Training 2/2 epoch (loss 0.1050): 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 499/840 [31:31<21:42, 3.82s/it] Training 2/2 epoch (loss 0.1050): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 500/840 [31:31<20:32, 3.62s/it] Training 2/2 epoch (loss 0.1982): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 500/840 [31:35<20:32, 3.62s/it] Training 2/2 epoch (loss 0.1982): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 501/840 [31:35<19:59, 3.54s/it] Training 2/2 epoch (loss 0.1118): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 501/840 [31:38<19:59, 3.54s/it] Training 2/2 epoch (loss 0.1118): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 502/840 [31:38<19:31, 3.47s/it] Training 2/2 epoch (loss 0.1621): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 502/840 [31:42<19:31, 3.47s/it] Training 2/2 epoch (loss 0.1621): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 503/840 [31:42<21:04, 3.75s/it] Training 2/2 epoch (loss 0.1484): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 503/840 [31:46<21:04, 3.75s/it] Training 2/2 epoch (loss 0.1484): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 504/840 [31:46<21:13, 3.79s/it] Training 2/2 epoch (loss 0.4121): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 504/840 [31:49<21:13, 3.79s/it] Training 2/2 epoch (loss 0.4121): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 505/840 [31:49<19:41, 3.53s/it] Training 2/2 epoch (loss 0.2295): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 505/840 [31:52<19:41, 3.53s/it] Training 2/2 epoch (loss 0.2295): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 506/840 [31:52<18:21, 3.30s/it] Training 2/2 epoch (loss 0.2002): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 506/840 [31:57<18:21, 3.30s/it] Training 2/2 epoch (loss 0.2002): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 507/840 [31:57<21:46, 3.92s/it] Training 2/2 epoch (loss 0.1758): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 507/840 [32:01<21:46, 3.92s/it] Training 2/2 epoch (loss 0.1758): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 508/840 [32:01<21:58, 3.97s/it] Training 2/2 epoch (loss 0.1943): 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 508/840 [32:05<21:58, 3.97s/it] Training 2/2 epoch (loss 0.1943): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 509/840 [32:05<21:03, 3.82s/it] Training 2/2 epoch (loss 0.1816): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 509/840 [32:08<21:03, 3.82s/it] Training 2/2 epoch (loss 0.1816): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 510/840 [32:08<20:25, 3.71s/it] Training 2/2 epoch (loss 0.2637): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 510/840 [32:11<20:25, 3.71s/it] Training 2/2 epoch (loss 0.2637): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 511/840 [32:11<18:45, 3.42s/it] Training 2/2 epoch (loss 0.3730): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 511/840 [32:14<18:45, 3.42s/it] Training 2/2 epoch (loss 0.3730): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 512/840 [32:14<18:16, 3.34s/it] Training 2/2 epoch (loss 0.1030): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 512/840 [32:17<18:16, 3.34s/it] Training 2/2 epoch (loss 0.1030): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 513/840 [32:17<17:15, 3.17s/it] Training 2/2 epoch (loss 0.1602): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 513/840 [32:20<17:15, 3.17s/it] Training 2/2 epoch (loss 0.1602): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 514/840 [32:20<17:17, 3.18s/it] Training 2/2 epoch (loss 0.1680): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 514/840 [32:24<17:17, 3.18s/it] Training 2/2 epoch (loss 0.1680): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 515/840 [32:24<18:14, 3.37s/it] Training 2/2 epoch (loss 0.1016): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 515/840 [32:30<18:14, 3.37s/it] Training 2/2 epoch (loss 0.1016): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 516/840 [32:30<21:48, 4.04s/it] Training 2/2 epoch (loss 0.0908): 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 516/840 [32:34<21:48, 4.04s/it] Training 2/2 epoch (loss 0.0908): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 517/840 [32:34<21:42, 4.03s/it] Training 2/2 epoch (loss 0.1963): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 517/840 [32:38<21:42, 4.03s/it] Training 2/2 epoch (loss 0.1963): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 518/840 [32:38<21:52, 4.07s/it] Training 2/2 epoch (loss 0.1475): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 518/840 [32:41<21:52, 4.07s/it] Training 2/2 epoch (loss 0.1475): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 519/840 [32:41<19:49, 3.71s/it] Training 2/2 epoch (loss 0.1396): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 519/840 [32:44<19:49, 3.71s/it] Training 2/2 epoch (loss 0.1396): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 520/840 [32:44<18:43, 3.51s/it] Training 2/2 epoch (loss 0.1396): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 520/840 [32:49<18:43, 3.51s/it] Training 2/2 epoch (loss 0.1396): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 521/840 [32:49<21:46, 4.10s/it] Training 2/2 epoch (loss 0.2754): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 521/840 [32:54<21:46, 4.10s/it] Training 2/2 epoch (loss 0.2754): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 522/840 [32:54<22:23, 4.22s/it] Training 2/2 epoch (loss 0.2168): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 522/840 [32:57<22:23, 4.22s/it] Training 2/2 epoch (loss 0.2168): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 523/840 [32:57<20:29, 3.88s/it] Training 2/2 epoch (loss 0.2100): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 523/840 [33:00<20:29, 3.88s/it] Training 2/2 epoch (loss 0.2100): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 524/840 [33:00<20:01, 3.80s/it] Training 2/2 epoch (loss 0.2090): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 524/840 [33:04<20:01, 3.80s/it] Training 2/2 epoch (loss 0.2090): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 525/840 [33:04<20:03, 3.82s/it] Training 2/2 epoch (loss 0.3457): 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 525/840 [33:09<20:03, 3.82s/it] Training 2/2 epoch (loss 0.3457): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 526/840 [33:09<21:05, 4.03s/it] Training 2/2 epoch (loss 0.1553): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 526/840 [33:14<21:05, 4.03s/it] Training 2/2 epoch (loss 0.1553): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 527/840 [33:14<23:07, 4.43s/it] Training 2/2 epoch (loss 0.1738): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 527/840 [33:17<23:07, 4.43s/it] Training 2/2 epoch (loss 0.1738): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 528/840 [33:17<21:12, 4.08s/it] Training 2/2 epoch (loss 0.1709): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 528/840 [33:21<21:12, 4.08s/it] Training 2/2 epoch (loss 0.1709): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 529/840 [33:21<19:40, 3.80s/it] Training 2/2 epoch (loss 0.1895): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 529/840 [33:26<19:40, 3.80s/it] Training 2/2 epoch (loss 0.1895): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 530/840 [33:26<22:10, 4.29s/it] Training 2/2 epoch (loss 0.3242): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 530/840 [33:29<22:10, 4.29s/it] Training 2/2 epoch (loss 0.3242): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 531/840 [33:29<19:33, 3.80s/it] Training 2/2 epoch (loss 0.2256): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 531/840 [33:34<19:33, 3.80s/it] Training 2/2 epoch (loss 0.2256): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 532/840 [33:34<22:07, 4.31s/it] Training 2/2 epoch (loss 0.3926): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 532/840 [33:40<22:07, 4.31s/it] Training 2/2 epoch (loss 0.3926): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 533/840 [33:40<23:48, 4.65s/it] Training 2/2 epoch (loss 0.2246): 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 533/840 [33:43<23:48, 4.65s/it] Training 2/2 epoch (loss 0.2246): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 534/840 [33:43<21:21, 4.19s/it] Training 2/2 epoch (loss 0.4336): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 534/840 [33:46<21:21, 4.19s/it] Training 2/2 epoch (loss 0.4336): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 535/840 [33:46<20:31, 4.04s/it] Training 2/2 epoch (loss 0.2637): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 535/840 [33:50<20:31, 4.04s/it] Training 2/2 epoch (loss 0.2637): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 536/840 [33:50<19:36, 3.87s/it] Training 2/2 epoch (loss 0.2715): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 536/840 [33:53<19:36, 3.87s/it] Training 2/2 epoch (loss 0.2715): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 537/840 [33:53<18:33, 3.67s/it] Training 2/2 epoch (loss 0.1426): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 537/840 [33:57<18:33, 3.67s/it] Training 2/2 epoch (loss 0.1426): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 538/840 [33:57<19:27, 3.87s/it] Training 2/2 epoch (loss 0.1660): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 538/840 [34:02<19:27, 3.87s/it] Training 2/2 epoch (loss 0.1660): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 539/840 [34:02<20:38, 4.11s/it] Training 2/2 epoch (loss 0.2246): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 539/840 [34:05<20:38, 4.11s/it] Training 2/2 epoch (loss 0.2246): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 540/840 [34:05<19:13, 3.84s/it] Training 2/2 epoch (loss 0.1416): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 540/840 [34:09<19:13, 3.84s/it] Training 2/2 epoch (loss 0.1416): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 541/840 [34:09<19:11, 3.85s/it] Training 2/2 epoch (loss 0.1367): 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 541/840 [34:13<19:11, 3.85s/it] Training 2/2 epoch (loss 0.1367): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 542/840 [34:13<18:58, 3.82s/it] Training 2/2 epoch (loss 0.2119): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 542/840 [34:16<18:58, 3.82s/it] Training 2/2 epoch (loss 0.2119): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 543/840 [34:16<17:51, 3.61s/it] Training 2/2 epoch (loss 0.1172): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 543/840 [34:20<17:51, 3.61s/it] Training 2/2 epoch (loss 0.1172): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 544/840 [34:20<18:32, 3.76s/it] Training 2/2 epoch (loss 0.1660): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 544/840 [34:23<18:32, 3.76s/it] Training 2/2 epoch (loss 0.1660): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 545/840 [34:23<16:48, 3.42s/it] Training 2/2 epoch (loss 0.1953): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 545/840 [34:26<16:48, 3.42s/it] Training 2/2 epoch (loss 0.1953): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 546/840 [34:26<16:23, 3.34s/it] Training 2/2 epoch (loss 0.1216): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 546/840 [34:29<16:23, 3.34s/it] Training 2/2 epoch (loss 0.1216): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 547/840 [34:29<16:03, 3.29s/it] Training 2/2 epoch (loss 0.3730): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 547/840 [34:35<16:03, 3.29s/it] Training 2/2 epoch (loss 0.3730): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 548/840 [34:35<19:12, 3.95s/it] Training 2/2 epoch (loss 0.2432): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 548/840 [34:37<19:12, 3.95s/it] Training 2/2 epoch (loss 0.2432): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 549/840 [34:37<17:41, 3.65s/it] Training 2/2 epoch (loss 0.2100): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 549/840 [34:41<17:41, 3.65s/it] Training 2/2 epoch (loss 0.2100): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 550/840 [34:41<16:47, 3.47s/it] Training 2/2 epoch (loss 0.1396): 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 550/840 [34:43<16:47, 3.47s/it] Training 2/2 epoch (loss 0.1396): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 551/840 [34:43<15:38, 3.25s/it] Training 2/2 epoch (loss 0.1216): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 551/840 [34:48<15:38, 3.25s/it] Training 2/2 epoch (loss 0.1216): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 552/840 [34:48<17:54, 3.73s/it] Training 2/2 epoch (loss 0.2656): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 552/840 [34:52<17:54, 3.73s/it] Training 2/2 epoch (loss 0.2656): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 553/840 [34:52<18:14, 3.81s/it] Training 2/2 epoch (loss 0.1924): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 553/840 [34:57<18:14, 3.81s/it] Training 2/2 epoch (loss 0.1924): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 554/840 [34:57<19:05, 4.00s/it] Training 2/2 epoch (loss 0.1299): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 554/840 [35:00<19:05, 4.00s/it] Training 2/2 epoch (loss 0.1299): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 555/840 [35:00<17:44, 3.74s/it] Training 2/2 epoch (loss 0.2168): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 555/840 [35:04<17:44, 3.74s/it] Training 2/2 epoch (loss 0.2168): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 556/840 [35:04<17:52, 3.78s/it] Training 2/2 epoch (loss 0.1455): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 556/840 [35:06<17:52, 3.78s/it] Training 2/2 epoch (loss 0.1455): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 557/840 [35:06<16:07, 3.42s/it] Training 2/2 epoch (loss 0.1016): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 557/840 [35:11<16:07, 3.42s/it] Training 2/2 epoch (loss 0.1016): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 558/840 [35:11<17:52, 3.80s/it] Training 2/2 epoch (loss 0.1484): 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 558/840 [35:15<17:52, 3.80s/it] Training 2/2 epoch (loss 0.1484): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 559/840 [35:15<18:08, 3.88s/it] Training 2/2 epoch (loss 0.2012): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 559/840 [35:18<18:08, 3.88s/it] Training 2/2 epoch (loss 0.2012): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 560/840 [35:18<17:21, 3.72s/it] Training 2/2 epoch (loss 0.0820): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 560/840 [35:23<17:21, 3.72s/it] Training 2/2 epoch (loss 0.0820): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 561/840 [35:23<18:45, 4.03s/it] Training 2/2 epoch (loss 0.1328): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 561/840 [35:29<18:45, 4.03s/it] Training 2/2 epoch (loss 0.1328): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 562/840 [35:29<20:44, 4.47s/it] Training 2/2 epoch (loss 0.1191): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 562/840 [35:32<20:44, 4.47s/it] Training 2/2 epoch (loss 0.1191): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 563/840 [35:32<19:39, 4.26s/it] Training 2/2 epoch (loss 0.1128): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 563/840 [35:35<19:39, 4.26s/it] Training 2/2 epoch (loss 0.1128): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 564/840 [35:35<18:01, 3.92s/it] Training 2/2 epoch (loss 0.2773): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 564/840 [35:38<18:01, 3.92s/it] Training 2/2 epoch (loss 0.2773): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 565/840 [35:38<16:38, 3.63s/it] Training 2/2 epoch (loss 0.3750): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 565/840 [35:43<16:38, 3.63s/it] Training 2/2 epoch (loss 0.3750): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 566/840 [35:43<18:33, 4.06s/it] Training 2/2 epoch (loss 0.1719): 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 566/840 [35:47<18:33, 4.06s/it] Training 2/2 epoch (loss 0.1719): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 567/840 [35:47<18:02, 3.97s/it] Training 2/2 epoch (loss 0.2598): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 567/840 [35:51<18:02, 3.97s/it] Training 2/2 epoch (loss 0.2598): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 568/840 [35:51<17:52, 3.94s/it] Training 2/2 epoch (loss 0.2188): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 568/840 [35:55<17:52, 3.94s/it] Training 2/2 epoch (loss 0.2188): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 569/840 [35:55<17:44, 3.93s/it] Training 2/2 epoch (loss 0.1738): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 569/840 [35:58<17:44, 3.93s/it] Training 2/2 epoch (loss 0.1738): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 570/840 [35:58<16:33, 3.68s/it] Training 2/2 epoch (loss 0.1846): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 570/840 [36:02<16:33, 3.68s/it] Training 2/2 epoch (loss 0.1846): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 571/840 [36:02<16:30, 3.68s/it] Training 2/2 epoch (loss 0.1992): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 571/840 [36:05<16:30, 3.68s/it] Training 2/2 epoch (loss 0.1992): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 572/840 [36:05<16:31, 3.70s/it] Training 2/2 epoch (loss 0.1885): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 572/840 [36:09<16:31, 3.70s/it] Training 2/2 epoch (loss 0.1885): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 573/840 [36:09<16:28, 3.70s/it] Training 2/2 epoch (loss 0.1064): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 573/840 [36:12<16:28, 3.70s/it] Training 2/2 epoch (loss 0.1064): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 574/840 [36:12<15:46, 3.56s/it] Training 2/2 epoch (loss 0.2471): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 574/840 [36:17<15:46, 3.56s/it] Training 2/2 epoch (loss 0.2471): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 575/840 [36:17<17:01, 3.85s/it] Training 2/2 epoch (loss 0.1533): 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 575/840 [36:21<17:01, 3.85s/it] Training 2/2 epoch (loss 0.1533): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 576/840 [36:21<16:49, 3.82s/it] Training 2/2 epoch (loss 0.1758): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 576/840 [36:24<16:49, 3.82s/it] Training 2/2 epoch (loss 0.1758): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 577/840 [36:24<15:55, 3.63s/it] Training 2/2 epoch (loss 0.2754): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 577/840 [36:27<15:55, 3.63s/it] Training 2/2 epoch (loss 0.2754): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 578/840 [36:27<14:50, 3.40s/it] Training 2/2 epoch (loss 0.1699): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 578/840 [36:30<14:50, 3.40s/it] Training 2/2 epoch (loss 0.1699): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 579/840 [36:30<15:12, 3.50s/it] Training 2/2 epoch (loss 0.0752): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 579/840 [36:35<15:12, 3.50s/it] Training 2/2 epoch (loss 0.0752): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 580/840 [36:35<16:09, 3.73s/it] Training 2/2 epoch (loss 0.3008): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 580/840 [36:39<16:09, 3.73s/it] Training 2/2 epoch (loss 0.3008): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 581/840 [36:39<17:02, 3.95s/it] Training 2/2 epoch (loss 0.5117): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 581/840 [36:45<17:02, 3.95s/it] Training 2/2 epoch (loss 0.5117): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 582/840 [36:45<19:03, 4.43s/it] Training 2/2 epoch (loss 0.0981): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 582/840 [36:49<19:03, 4.43s/it] Training 2/2 epoch (loss 0.0981): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 583/840 [36:49<18:40, 4.36s/it] Training 2/2 epoch (loss 0.1602): 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 583/840 [36:52<18:40, 4.36s/it] Training 2/2 epoch (loss 0.1602): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 584/840 [36:52<17:02, 3.99s/it] Training 2/2 epoch (loss 0.1006): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 584/840 [36:56<17:02, 3.99s/it] Training 2/2 epoch (loss 0.1006): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 585/840 [36:56<16:32, 3.89s/it] Training 2/2 epoch (loss 0.1758): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 585/840 [36:59<16:32, 3.89s/it] Training 2/2 epoch (loss 0.1758): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 586/840 [36:59<15:51, 3.75s/it] Training 2/2 epoch (loss 0.2178): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 586/840 [37:02<15:51, 3.75s/it] Training 2/2 epoch (loss 0.2178): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 587/840 [37:02<14:57, 3.55s/it] Training 2/2 epoch (loss 0.1338): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 587/840 [37:06<14:57, 3.55s/it] Training 2/2 epoch (loss 0.1338): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 588/840 [37:06<14:55, 3.56s/it] Training 2/2 epoch (loss 0.1318): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 588/840 [37:09<14:55, 3.56s/it] Training 2/2 epoch (loss 0.1318): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 589/840 [37:09<14:36, 3.49s/it] Training 2/2 epoch (loss 0.1621): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 589/840 [37:13<14:36, 3.49s/it] Training 2/2 epoch (loss 0.1621): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 590/840 [37:13<14:39, 3.52s/it] Training 2/2 epoch (loss 0.0493): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 590/840 [37:17<14:39, 3.52s/it] Training 2/2 epoch (loss 0.0493): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 591/840 [37:17<15:18, 3.69s/it] Training 2/2 epoch (loss 0.0947): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 591/840 [37:20<15:18, 3.69s/it] Training 2/2 epoch (loss 0.0947): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 592/840 [37:20<15:05, 3.65s/it] Training 2/2 epoch (loss 0.1138): 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 592/840 [37:24<15:05, 3.65s/it] Training 2/2 epoch (loss 0.1138): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 593/840 [37:24<15:11, 3.69s/it] Training 2/2 epoch (loss 0.1855): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 593/840 [37:27<15:11, 3.69s/it] Training 2/2 epoch (loss 0.1855): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 594/840 [37:27<14:00, 3.42s/it] Training 2/2 epoch (loss 0.0664): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 594/840 [37:30<14:00, 3.42s/it] Training 2/2 epoch (loss 0.0664): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 595/840 [37:30<13:05, 3.20s/it] Training 2/2 epoch (loss 0.1289): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 595/840 [37:33<13:05, 3.20s/it] Training 2/2 epoch (loss 0.1289): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 596/840 [37:33<12:58, 3.19s/it] Training 2/2 epoch (loss 0.1099): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 596/840 [37:38<12:58, 3.19s/it] Training 2/2 epoch (loss 0.1099): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 597/840 [37:38<15:41, 3.87s/it] Training 2/2 epoch (loss 0.1865): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 597/840 [37:42<15:41, 3.87s/it] Training 2/2 epoch (loss 0.1865): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 598/840 [37:42<15:20, 3.80s/it] Training 2/2 epoch (loss 0.2715): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 598/840 [37:46<15:20, 3.80s/it] Training 2/2 epoch (loss 0.2715): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 599/840 [37:46<15:46, 3.93s/it] Training 2/2 epoch (loss 0.0527): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 599/840 [37:49<15:46, 3.93s/it] Training 2/2 epoch (loss 0.0527): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 600/840 [37:49<14:06, 3.53s/it] Training 2/2 epoch (loss 0.0483): 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 600/840 [37:52<14:06, 3.53s/it] Training 2/2 epoch (loss 0.0483): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 601/840 [37:52<13:37, 3.42s/it] Training 2/2 epoch (loss 0.0713): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 601/840 [37:55<13:37, 3.42s/it] Training 2/2 epoch (loss 0.0713): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 602/840 [37:55<12:44, 3.21s/it] Training 2/2 epoch (loss 0.0383): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 602/840 [37:58<12:44, 3.21s/it] Training 2/2 epoch (loss 0.0383): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 603/840 [37:58<12:31, 3.17s/it] Training 2/2 epoch (loss 0.0771): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 603/840 [38:01<12:31, 3.17s/it] Training 2/2 epoch (loss 0.0771): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 604/840 [38:01<12:27, 3.17s/it] Training 2/2 epoch (loss 0.1963): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 604/840 [38:04<12:27, 3.17s/it] Training 2/2 epoch (loss 0.1963): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 605/840 [38:04<12:50, 3.28s/it] Training 2/2 epoch (loss 0.0376): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 605/840 [38:09<12:50, 3.28s/it] Training 2/2 epoch (loss 0.0376): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 606/840 [38:09<14:00, 3.59s/it] Training 2/2 epoch (loss 0.3691): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 606/840 [38:12<14:00, 3.59s/it] Training 2/2 epoch (loss 0.3691): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 607/840 [38:12<13:46, 3.55s/it] Training 2/2 epoch (loss 0.1816): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 607/840 [38:17<13:46, 3.55s/it] Training 2/2 epoch (loss 0.1816): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 608/840 [38:17<14:47, 3.82s/it] Training 2/2 epoch (loss 0.2197): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 608/840 [38:20<14:47, 3.82s/it] Training 2/2 epoch (loss 0.2197): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 609/840 [38:20<13:50, 3.59s/it] Training 2/2 epoch (loss 0.1250): 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 609/840 [38:23<13:50, 3.59s/it] Training 2/2 epoch (loss 0.1250): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 610/840 [38:23<13:24, 3.50s/it] Training 2/2 epoch (loss 0.0530): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 610/840 [38:27<13:24, 3.50s/it] Training 2/2 epoch (loss 0.0530): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 611/840 [38:27<13:44, 3.60s/it] Training 2/2 epoch (loss 0.0854): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 611/840 [38:30<13:44, 3.60s/it] Training 2/2 epoch (loss 0.0854): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 612/840 [38:30<13:38, 3.59s/it] Training 2/2 epoch (loss 0.1357): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 612/840 [38:33<13:38, 3.59s/it] Training 2/2 epoch (loss 0.1357): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 613/840 [38:33<12:58, 3.43s/it] Training 2/2 epoch (loss 0.1416): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 613/840 [38:37<12:58, 3.43s/it] Training 2/2 epoch (loss 0.1416): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 614/840 [38:37<13:20, 3.54s/it] Training 2/2 epoch (loss 0.1016): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 614/840 [38:40<13:20, 3.54s/it] Training 2/2 epoch (loss 0.1016): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 615/840 [38:40<12:29, 3.33s/it] Training 2/2 epoch (loss 0.1040): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 615/840 [38:43<12:29, 3.33s/it] Training 2/2 epoch (loss 0.1040): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 616/840 [38:43<12:27, 3.34s/it] Training 2/2 epoch (loss 0.4531): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 616/840 [38:49<12:27, 3.34s/it] Training 2/2 epoch (loss 0.4531): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 617/840 [38:49<14:48, 3.98s/it] Training 2/2 epoch (loss 0.3594): 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 617/840 [38:53<14:48, 3.98s/it] Training 2/2 epoch (loss 0.3594): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 618/840 [38:53<14:30, 3.92s/it] Training 2/2 epoch (loss 0.1992): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 618/840 [38:56<14:30, 3.92s/it] Training 2/2 epoch (loss 0.1992): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 619/840 [38:56<13:27, 3.66s/it] Training 2/2 epoch (loss 0.0518): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 619/840 [38:59<13:27, 3.66s/it] Training 2/2 epoch (loss 0.0518): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 620/840 [38:59<13:09, 3.59s/it] Training 2/2 epoch (loss 0.0732): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 620/840 [39:03<13:09, 3.59s/it] Training 2/2 epoch (loss 0.0732): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 621/840 [39:03<13:11, 3.61s/it] Training 2/2 epoch (loss 0.1299): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 621/840 [39:06<13:11, 3.61s/it] Training 2/2 epoch (loss 0.1299): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 622/840 [39:06<13:04, 3.60s/it] Training 2/2 epoch (loss 0.0386): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 622/840 [39:09<13:04, 3.60s/it] Training 2/2 epoch (loss 0.0386): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 623/840 [39:09<11:55, 3.30s/it] Training 2/2 epoch (loss 0.2275): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 623/840 [39:13<11:55, 3.30s/it] Training 2/2 epoch (loss 0.2275): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 624/840 [39:13<12:23, 3.44s/it] Training 2/2 epoch (loss 0.1367): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 624/840 [39:18<12:23, 3.44s/it] Training 2/2 epoch (loss 0.1367): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 625/840 [39:18<13:52, 3.87s/it] Training 2/2 epoch (loss 0.1748): 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 625/840 [39:21<13:52, 3.87s/it] Training 2/2 epoch (loss 0.1748): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 626/840 [39:21<13:21, 3.75s/it] Training 2/2 epoch (loss 0.0918): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 626/840 [39:24<13:21, 3.75s/it] Training 2/2 epoch (loss 0.0918): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 627/840 [39:24<12:44, 3.59s/it] Training 2/2 epoch (loss 0.0654): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 627/840 [39:29<12:44, 3.59s/it] Training 2/2 epoch (loss 0.0654): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 628/840 [39:29<13:52, 3.93s/it] Training 2/2 epoch (loss 0.1006): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 628/840 [39:32<13:52, 3.93s/it] Training 2/2 epoch (loss 0.1006): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 629/840 [39:32<12:34, 3.58s/it] Training 2/2 epoch (loss 0.1133): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 629/840 [39:35<12:34, 3.58s/it] Training 2/2 epoch (loss 0.1133): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 630/840 [39:35<12:10, 3.48s/it] Training 2/2 epoch (loss 0.0520): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 630/840 [39:41<12:10, 3.48s/it] Training 2/2 epoch (loss 0.0520): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 631/840 [39:41<14:11, 4.07s/it] Training 2/2 epoch (loss 0.0732): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 631/840 [39:43<14:11, 4.07s/it] Training 2/2 epoch (loss 0.0732): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 632/840 [39:43<12:59, 3.75s/it] Training 2/2 epoch (loss 0.0791): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 632/840 [39:47<12:59, 3.75s/it] Training 2/2 epoch (loss 0.0791): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 633/840 [39:47<12:15, 3.55s/it] Training 2/2 epoch (loss 0.0435): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 633/840 [39:50<12:15, 3.55s/it] Training 2/2 epoch (loss 0.0435): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 634/840 [39:50<12:10, 3.54s/it] Training 2/2 epoch (loss 0.1533): 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 634/840 [39:56<12:10, 3.54s/it] Training 2/2 epoch (loss 0.1533): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 635/840 [39:56<14:03, 4.11s/it] Training 2/2 epoch (loss 0.1060): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 635/840 [40:00<14:03, 4.11s/it] Training 2/2 epoch (loss 0.1060): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 636/840 [40:00<13:58, 4.11s/it] Training 2/2 epoch (loss 0.1250): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 636/840 [40:04<13:58, 4.11s/it] Training 2/2 epoch (loss 0.1250): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 637/840 [40:04<13:39, 4.03s/it] Training 2/2 epoch (loss 0.1250): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 637/840 [40:08<13:39, 4.03s/it] Training 2/2 epoch (loss 0.1250): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 638/840 [40:08<14:31, 4.32s/it] Training 2/2 epoch (loss 0.0566): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 638/840 [40:12<14:31, 4.32s/it] Training 2/2 epoch (loss 0.0566): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 639/840 [40:12<13:46, 4.11s/it] Training 2/2 epoch (loss 0.0253): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 639/840 [40:16<13:46, 4.11s/it] Training 2/2 epoch (loss 0.0253): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 640/840 [40:16<13:52, 4.16s/it] Training 2/2 epoch (loss 0.0942): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 640/840 [40:19<13:52, 4.16s/it] Training 2/2 epoch (loss 0.0942): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 641/840 [40:19<12:22, 3.73s/it] Training 2/2 epoch (loss 0.0312): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 641/840 [40:24<12:22, 3.73s/it] Training 2/2 epoch (loss 0.0312): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 642/840 [40:24<13:28, 4.09s/it] Training 2/2 epoch (loss 0.1504): 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 642/840 [40:27<13:28, 4.09s/it] Training 2/2 epoch (loss 0.1504): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 643/840 [40:27<12:38, 3.85s/it] Training 2/2 epoch (loss 0.1167): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 643/840 [40:30<12:38, 3.85s/it] Training 2/2 epoch (loss 0.1167): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 644/840 [40:30<11:17, 3.46s/it] Training 2/2 epoch (loss 0.0601): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 644/840 [40:34<11:17, 3.46s/it] Training 2/2 epoch (loss 0.0601): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 645/840 [40:34<12:14, 3.77s/it] Training 2/2 epoch (loss 0.1670): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 645/840 [40:38<12:14, 3.77s/it] Training 2/2 epoch (loss 0.1670): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 646/840 [40:38<11:59, 3.71s/it] Training 2/2 epoch (loss 0.2041): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 646/840 [40:41<11:59, 3.71s/it] Training 2/2 epoch (loss 0.2041): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 647/840 [40:41<11:18, 3.51s/it] Training 2/2 epoch (loss 0.1074): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 647/840 [40:47<11:18, 3.51s/it] Training 2/2 epoch (loss 0.1074): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 648/840 [40:47<13:13, 4.13s/it] Training 2/2 epoch (loss 0.2090): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 648/840 [40:50<13:13, 4.13s/it] Training 2/2 epoch (loss 0.2090): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 649/840 [40:50<12:23, 3.89s/it] Training 2/2 epoch (loss 0.1138): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 649/840 [40:54<12:23, 3.89s/it] Training 2/2 epoch (loss 0.1138): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 650/840 [40:54<12:54, 4.07s/it] Training 2/2 epoch (loss 0.1758): 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 650/840 [40:57<12:54, 4.07s/it] Training 2/2 epoch (loss 0.1758): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 651/840 [40:57<11:42, 3.72s/it] Training 2/2 epoch (loss 0.0535): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 651/840 [41:00<11:42, 3.72s/it] Training 2/2 epoch (loss 0.0535): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 652/840 [41:00<11:08, 3.56s/it] Training 2/2 epoch (loss 0.0791): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 652/840 [41:05<11:08, 3.56s/it] Training 2/2 epoch (loss 0.0791): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 653/840 [41:05<12:03, 3.87s/it] Training 2/2 epoch (loss 0.0266): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 653/840 [41:08<12:03, 3.87s/it] Training 2/2 epoch (loss 0.0266): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 654/840 [41:08<11:01, 3.56s/it] Training 2/2 epoch (loss 0.3711): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 654/840 [41:11<11:01, 3.56s/it] Training 2/2 epoch (loss 0.3711): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 655/840 [41:11<10:17, 3.34s/it] Training 2/2 epoch (loss 0.1777): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 655/840 [41:16<10:17, 3.34s/it] Training 2/2 epoch (loss 0.1777): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 656/840 [41:16<12:18, 4.01s/it] Training 2/2 epoch (loss 0.0767): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 656/840 [41:21<12:18, 4.01s/it] Training 2/2 epoch (loss 0.0767): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 657/840 [41:21<12:38, 4.15s/it] Training 2/2 epoch (loss 0.1006): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 657/840 [41:24<12:38, 4.15s/it] Training 2/2 epoch (loss 0.1006): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 658/840 [41:24<11:44, 3.87s/it] Training 2/2 epoch (loss 0.3164): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 658/840 [41:29<11:44, 3.87s/it] Training 2/2 epoch (loss 0.3164): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 659/840 [41:29<13:04, 4.33s/it] Training 2/2 epoch (loss 0.1758): 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 659/840 [41:33<13:04, 4.33s/it] Training 2/2 epoch (loss 0.1758): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 660/840 [41:33<12:30, 4.17s/it] Training 2/2 epoch (loss 0.1099): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 660/840 [41:38<12:30, 4.17s/it] Training 2/2 epoch (loss 0.1099): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 661/840 [41:38<12:44, 4.27s/it] Training 2/2 epoch (loss 0.1011): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 661/840 [41:42<12:44, 4.27s/it] Training 2/2 epoch (loss 0.1011): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 662/840 [41:42<12:25, 4.19s/it] Training 2/2 epoch (loss 0.1416): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 662/840 [41:45<12:25, 4.19s/it] Training 2/2 epoch (loss 0.1416): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 663/840 [41:45<11:42, 3.97s/it] Training 2/2 epoch (loss 0.0640): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 663/840 [41:51<11:42, 3.97s/it] Training 2/2 epoch (loss 0.0640): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 664/840 [41:51<12:58, 4.42s/it] Training 2/2 epoch (loss 0.0928): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 664/840 [41:54<12:58, 4.42s/it] Training 2/2 epoch (loss 0.0928): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 665/840 [41:54<11:50, 4.06s/it] Training 2/2 epoch (loss 0.2812): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 665/840 [41:57<11:50, 4.06s/it] Training 2/2 epoch (loss 0.2812): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 666/840 [41:57<10:58, 3.78s/it] Training 2/2 epoch (loss 0.0664): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 666/840 [42:00<10:58, 3.78s/it] Training 2/2 epoch (loss 0.0664): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 667/840 [42:00<09:51, 3.42s/it] Training 2/2 epoch (loss 0.0898): 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 667/840 [42:05<09:51, 3.42s/it] Training 2/2 epoch (loss 0.0898): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 668/840 [42:05<11:38, 4.06s/it] Training 2/2 epoch (loss 0.1152): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 668/840 [42:08<11:38, 4.06s/it] Training 2/2 epoch (loss 0.1152): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 669/840 [42:08<10:42, 3.76s/it] Training 2/2 epoch (loss 0.0559): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 669/840 [42:11<10:42, 3.76s/it] Training 2/2 epoch (loss 0.0559): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 670/840 [42:11<09:58, 3.52s/it] Training 2/2 epoch (loss 0.1133): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 670/840 [42:15<09:58, 3.52s/it] Training 2/2 epoch (loss 0.1133): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 671/840 [42:15<10:03, 3.57s/it] Training 2/2 epoch (loss 0.0549): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 671/840 [42:18<10:03, 3.57s/it] Training 2/2 epoch (loss 0.0549): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 672/840 [42:18<09:42, 3.47s/it] Training 2/2 epoch (loss 0.0461): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 672/840 [42:22<09:42, 3.47s/it] Training 2/2 epoch (loss 0.0461): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 673/840 [42:22<09:47, 3.52s/it] Training 2/2 epoch (loss 0.1660): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 673/840 [42:26<09:47, 3.52s/it] Training 2/2 epoch (loss 0.1660): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 674/840 [42:26<10:02, 3.63s/it] Training 2/2 epoch (loss 0.1143): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 674/840 [42:29<10:02, 3.63s/it] Training 2/2 epoch (loss 0.1143): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 675/840 [42:29<09:43, 3.54s/it] Training 2/2 epoch (loss 0.0518): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 675/840 [42:33<09:43, 3.54s/it] Training 2/2 epoch (loss 0.0518): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 676/840 [42:33<09:55, 3.63s/it] Training 2/2 epoch (loss 0.0432): 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 676/840 [42:38<09:55, 3.63s/it] Training 2/2 epoch (loss 0.0432): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 677/840 [42:38<10:48, 3.98s/it] Training 2/2 epoch (loss 0.0664): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 677/840 [42:41<10:48, 3.98s/it] Training 2/2 epoch (loss 0.0664): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 678/840 [42:41<10:13, 3.79s/it] Training 2/2 epoch (loss 0.1553): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 678/840 [42:44<10:13, 3.79s/it] Training 2/2 epoch (loss 0.1553): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 679/840 [42:44<09:24, 3.51s/it] Training 2/2 epoch (loss 0.0986): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 679/840 [42:47<09:24, 3.51s/it] Training 2/2 epoch (loss 0.0986): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 680/840 [42:47<09:30, 3.57s/it] Training 2/2 epoch (loss 0.0806): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 680/840 [42:50<09:30, 3.57s/it] Training 2/2 epoch (loss 0.0806): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 681/840 [42:50<08:41, 3.28s/it] Training 2/2 epoch (loss 0.0354): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 681/840 [42:54<08:41, 3.28s/it] Training 2/2 epoch (loss 0.0354): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 682/840 [42:54<08:51, 3.36s/it] Training 2/2 epoch (loss 0.1641): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 682/840 [42:56<08:51, 3.36s/it] Training 2/2 epoch (loss 0.1641): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 683/840 [42:56<08:14, 3.15s/it] Training 2/2 epoch (loss 0.0679): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 683/840 [42:59<08:14, 3.15s/it] Training 2/2 epoch (loss 0.0679): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 684/840 [42:59<07:43, 2.97s/it] Training 2/2 epoch (loss 0.2910): 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 684/840 [43:02<07:43, 2.97s/it] Training 2/2 epoch (loss 0.2910): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 685/840 [43:02<07:36, 2.95s/it] Training 2/2 epoch (loss 0.3105): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 685/840 [43:07<07:36, 2.95s/it] Training 2/2 epoch (loss 0.3105): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 686/840 [43:07<09:31, 3.71s/it] Training 2/2 epoch (loss 0.2695): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 686/840 [43:10<09:31, 3.71s/it] Training 2/2 epoch (loss 0.2695): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 687/840 [43:10<08:55, 3.50s/it] Training 2/2 epoch (loss 0.0962): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 687/840 [43:13<08:55, 3.50s/it] Training 2/2 epoch (loss 0.0962): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 688/840 [43:13<08:39, 3.42s/it] Training 2/2 epoch (loss 0.0732): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 688/840 [43:16<08:39, 3.42s/it] Training 2/2 epoch (loss 0.0732): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 689/840 [43:16<07:56, 3.15s/it] Training 2/2 epoch (loss 0.0830): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 689/840 [43:19<07:56, 3.15s/it] Training 2/2 epoch (loss 0.0830): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 690/840 [43:19<08:01, 3.21s/it] Training 2/2 epoch (loss 0.0413): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 690/840 [43:24<08:01, 3.21s/it] Training 2/2 epoch (loss 0.0413): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 691/840 [43:24<09:10, 3.69s/it] Training 2/2 epoch (loss 0.0435): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 691/840 [43:27<09:10, 3.69s/it] Training 2/2 epoch (loss 0.0435): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 692/840 [43:27<08:36, 3.49s/it] Training 2/2 epoch (loss 0.1216): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 692/840 [43:31<08:36, 3.49s/it] Training 2/2 epoch (loss 0.1216): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 693/840 [43:31<09:05, 3.71s/it] Training 2/2 epoch (loss 0.0742): 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 693/840 [43:35<09:05, 3.71s/it] Training 2/2 epoch (loss 0.0742): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 694/840 [43:35<09:19, 3.83s/it] Training 2/2 epoch (loss 0.0598): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 694/840 [43:40<09:19, 3.83s/it] Training 2/2 epoch (loss 0.0598): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 695/840 [43:40<09:57, 4.12s/it] Training 2/2 epoch (loss 0.0598): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 695/840 [43:45<09:57, 4.12s/it] Training 2/2 epoch (loss 0.0598): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 696/840 [43:45<09:58, 4.16s/it] Training 2/2 epoch (loss 0.1553): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 696/840 [43:48<09:58, 4.16s/it] Training 2/2 epoch (loss 0.1553): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 697/840 [43:48<09:15, 3.89s/it] Training 2/2 epoch (loss 0.0801): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 697/840 [43:52<09:15, 3.89s/it] Training 2/2 epoch (loss 0.0801): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 698/840 [43:52<09:12, 3.89s/it] Training 2/2 epoch (loss 0.0645): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 698/840 [43:55<09:12, 3.89s/it] Training 2/2 epoch (loss 0.0645): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 699/840 [43:55<08:57, 3.81s/it] Training 2/2 epoch (loss 0.1133): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 699/840 [43:58<08:57, 3.81s/it] Training 2/2 epoch (loss 0.1133): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 700/840 [43:58<08:13, 3.52s/it] Training 2/2 epoch (loss 0.1035): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 700/840 [44:02<08:13, 3.52s/it] Training 2/2 epoch (loss 0.1035): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 701/840 [44:02<08:07, 3.51s/it] Training 2/2 epoch (loss 0.0693): 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 701/840 [44:05<08:07, 3.51s/it] Training 2/2 epoch (loss 0.0693): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 702/840 [44:05<07:47, 3.39s/it] Training 2/2 epoch (loss 0.0552): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 702/840 [44:09<07:47, 3.39s/it] Training 2/2 epoch (loss 0.0552): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 703/840 [44:09<08:08, 3.56s/it] Training 2/2 epoch (loss 0.1104): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 703/840 [44:12<08:08, 3.56s/it] Training 2/2 epoch (loss 0.1104): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 704/840 [44:12<07:34, 3.34s/it] Training 2/2 epoch (loss 0.0347): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 704/840 [44:15<07:34, 3.34s/it] Training 2/2 epoch (loss 0.0347): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 705/840 [44:15<07:47, 3.46s/it] Training 2/2 epoch (loss 0.0396): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 705/840 [44:19<07:47, 3.46s/it] Training 2/2 epoch (loss 0.0396): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 706/840 [44:19<07:43, 3.46s/it] Training 2/2 epoch (loss 0.1357): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 706/840 [44:22<07:43, 3.46s/it] Training 2/2 epoch (loss 0.1357): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 707/840 [44:22<07:44, 3.49s/it] Training 2/2 epoch (loss 0.0503): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 707/840 [44:25<07:44, 3.49s/it] Training 2/2 epoch (loss 0.0503): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 708/840 [44:25<07:14, 3.29s/it] Training 2/2 epoch (loss 0.0786): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 708/840 [44:28<07:14, 3.29s/it] Training 2/2 epoch (loss 0.0786): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 709/840 [44:28<07:02, 3.23s/it] Training 2/2 epoch (loss 0.1523): 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 709/840 [44:32<07:02, 3.23s/it] Training 2/2 epoch (loss 0.1523): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 710/840 [44:32<07:14, 3.35s/it] Training 2/2 epoch (loss 0.0143): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 710/840 [44:36<07:14, 3.35s/it] Training 2/2 epoch (loss 0.0143): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 711/840 [44:36<07:47, 3.62s/it] Training 2/2 epoch (loss 0.0266): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 711/840 [44:41<07:47, 3.62s/it] Training 2/2 epoch (loss 0.0266): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 712/840 [44:41<08:32, 4.01s/it] Training 2/2 epoch (loss 0.2812): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 712/840 [44:45<08:32, 4.01s/it] Training 2/2 epoch (loss 0.2812): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 713/840 [44:45<08:41, 4.11s/it] Training 2/2 epoch (loss 0.0557): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 713/840 [44:48<08:41, 4.11s/it] Training 2/2 epoch (loss 0.0557): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 714/840 [44:48<07:58, 3.80s/it] Training 2/2 epoch (loss 0.0742): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 714/840 [44:51<07:58, 3.80s/it] Training 2/2 epoch (loss 0.0742): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 715/840 [44:51<07:15, 3.48s/it] Training 2/2 epoch (loss 0.0457): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 715/840 [44:55<07:15, 3.48s/it] Training 2/2 epoch (loss 0.0457): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 716/840 [44:55<07:12, 3.49s/it] Training 2/2 epoch (loss 0.1079): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 716/840 [44:58<07:12, 3.49s/it] Training 2/2 epoch (loss 0.1079): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 717/840 [44:58<07:15, 3.54s/it] Training 2/2 epoch (loss 0.0471): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 717/840 [45:02<07:15, 3.54s/it] Training 2/2 epoch (loss 0.0471): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 718/840 [45:02<07:20, 3.61s/it] Training 2/2 epoch (loss 0.1064): 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 718/840 [45:06<07:20, 3.61s/it] Training 2/2 epoch (loss 0.1064): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 719/840 [45:06<07:11, 3.57s/it] Training 2/2 epoch (loss 0.0182): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 719/840 [45:09<07:11, 3.57s/it] Training 2/2 epoch (loss 0.0182): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 720/840 [45:09<06:46, 3.39s/it] Training 2/2 epoch (loss 0.2119): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 720/840 [45:13<06:46, 3.39s/it] Training 2/2 epoch (loss 0.2119): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 721/840 [45:13<07:33, 3.81s/it] Training 2/2 epoch (loss 0.1689): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 721/840 [45:17<07:33, 3.81s/it] Training 2/2 epoch (loss 0.1689): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 722/840 [45:17<07:30, 3.81s/it] Training 2/2 epoch (loss 0.0479): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 722/840 [45:23<07:30, 3.81s/it] Training 2/2 epoch (loss 0.0479): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 723/840 [45:23<08:25, 4.32s/it] Training 2/2 epoch (loss 0.0776): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 723/840 [45:26<08:25, 4.32s/it] Training 2/2 epoch (loss 0.0776): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 724/840 [45:26<07:45, 4.01s/it] Training 2/2 epoch (loss 0.0762): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 724/840 [45:30<07:45, 4.01s/it] Training 2/2 epoch (loss 0.0762): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 725/840 [45:30<07:28, 3.90s/it] Training 2/2 epoch (loss 0.1045): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 725/840 [45:35<07:28, 3.90s/it] Training 2/2 epoch (loss 0.1045): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 726/840 [45:35<08:00, 4.21s/it] Training 2/2 epoch (loss 0.0476): 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 726/840 [45:38<08:00, 4.21s/it] Training 2/2 epoch (loss 0.0476): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 727/840 [45:38<07:34, 4.02s/it] Training 2/2 epoch (loss 0.0923): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 727/840 [45:41<07:34, 4.02s/it] Training 2/2 epoch (loss 0.0923): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 728/840 [45:41<07:04, 3.79s/it] Training 2/2 epoch (loss 0.0688): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 728/840 [45:45<07:04, 3.79s/it] Training 2/2 epoch (loss 0.0688): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 729/840 [45:45<06:56, 3.75s/it] Training 2/2 epoch (loss 0.0449): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 729/840 [45:48<06:56, 3.75s/it] Training 2/2 epoch (loss 0.0449): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 730/840 [45:48<06:34, 3.58s/it] Training 2/2 epoch (loss 0.0300): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 730/840 [45:52<06:34, 3.58s/it] Training 2/2 epoch (loss 0.0300): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 731/840 [45:52<06:31, 3.59s/it] Training 2/2 epoch (loss 0.1279): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 731/840 [45:55<06:31, 3.59s/it] Training 2/2 epoch (loss 0.1279): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 732/840 [45:55<06:19, 3.52s/it] Training 2/2 epoch (loss 0.0718): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 732/840 [45:58<06:19, 3.52s/it] Training 2/2 epoch (loss 0.0718): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 733/840 [45:58<05:42, 3.20s/it] Training 2/2 epoch (loss 0.0693): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 733/840 [46:02<05:42, 3.20s/it] Training 2/2 epoch (loss 0.0693): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 734/840 [46:02<06:08, 3.48s/it] Training 2/2 epoch (loss 0.0271): 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 734/840 [46:06<06:08, 3.48s/it] Training 2/2 epoch (loss 0.0271): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 735/840 [46:06<06:32, 3.74s/it] Training 2/2 epoch (loss 0.0532): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 735/840 [46:10<06:32, 3.74s/it] Training 2/2 epoch (loss 0.0532): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 736/840 [46:10<06:24, 3.70s/it] Training 2/2 epoch (loss 0.0476): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 736/840 [46:14<06:24, 3.70s/it] Training 2/2 epoch (loss 0.0476): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 737/840 [46:14<06:39, 3.88s/it] Training 2/2 epoch (loss 0.1914): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 737/840 [46:17<06:39, 3.88s/it] Training 2/2 epoch (loss 0.1914): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 738/840 [46:17<06:12, 3.65s/it] Training 2/2 epoch (loss 0.1006): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 738/840 [46:23<06:12, 3.65s/it] Training 2/2 epoch (loss 0.1006): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 739/840 [46:23<07:02, 4.18s/it] Training 2/2 epoch (loss 0.0239): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 739/840 [46:25<07:02, 4.18s/it] Training 2/2 epoch (loss 0.0239): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 740/840 [46:25<06:20, 3.81s/it] Training 2/2 epoch (loss 0.0498): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 740/840 [46:31<06:20, 3.81s/it] Training 2/2 epoch (loss 0.0498): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 741/840 [46:31<07:02, 4.26s/it] Training 2/2 epoch (loss 0.1182): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 741/840 [46:35<07:02, 4.26s/it] Training 2/2 epoch (loss 0.1182): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 742/840 [46:35<07:03, 4.32s/it] Training 2/2 epoch (loss 0.0859): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 742/840 [46:40<07:03, 4.32s/it] Training 2/2 epoch (loss 0.0859): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 743/840 [46:40<07:06, 4.40s/it] Training 2/2 epoch (loss 0.0952): 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 743/840 [46:43<07:06, 4.40s/it] Training 2/2 epoch (loss 0.0952): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 744/840 [46:43<06:36, 4.13s/it] Training 2/2 epoch (loss 0.0583): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 744/840 [46:46<06:36, 4.13s/it] Training 2/2 epoch (loss 0.0583): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 745/840 [46:46<05:57, 3.76s/it] Training 2/2 epoch (loss 0.2227): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 745/840 [46:51<05:57, 3.76s/it] Training 2/2 epoch (loss 0.2227): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 746/840 [46:51<06:20, 4.05s/it] Training 2/2 epoch (loss 0.0718): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 746/840 [46:56<06:20, 4.05s/it] Training 2/2 epoch (loss 0.0718): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 747/840 [46:56<06:54, 4.46s/it] Training 2/2 epoch (loss 0.0410): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 747/840 [47:00<06:54, 4.46s/it] Training 2/2 epoch (loss 0.0410): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 748/840 [47:00<06:31, 4.26s/it] Training 2/2 epoch (loss 0.0718): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 748/840 [47:04<06:31, 4.26s/it] Training 2/2 epoch (loss 0.0718): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 749/840 [47:04<06:02, 3.99s/it] Training 2/2 epoch (loss 0.0325): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 749/840 [47:07<06:02, 3.99s/it] Training 2/2 epoch (loss 0.0325): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 750/840 [47:07<05:35, 3.73s/it] Training 2/2 epoch (loss 0.0166): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 750/840 [47:10<05:35, 3.73s/it] Training 2/2 epoch (loss 0.0166): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 751/840 [47:10<05:16, 3.56s/it] Training 2/2 epoch (loss 0.0781): 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 751/840 [47:12<05:16, 3.56s/it] Training 2/2 epoch (loss 0.0781): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 752/840 [47:12<04:48, 3.27s/it] Training 2/2 epoch (loss 0.0938): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 752/840 [47:18<04:48, 3.27s/it] Training 2/2 epoch (loss 0.0938): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 753/840 [47:18<05:41, 3.93s/it] Training 2/2 epoch (loss 0.1172): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 753/840 [47:23<05:41, 3.93s/it] Training 2/2 epoch (loss 0.1172): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 754/840 [47:23<06:20, 4.42s/it] Training 2/2 epoch (loss 0.0337): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 754/840 [47:29<06:20, 4.42s/it] Training 2/2 epoch (loss 0.0337): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 755/840 [47:29<06:43, 4.74s/it] Training 2/2 epoch (loss 0.0327): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 755/840 [47:33<06:43, 4.74s/it] Training 2/2 epoch (loss 0.0327): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 756/840 [47:33<06:08, 4.39s/it] Training 2/2 epoch (loss 0.0908): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 756/840 [47:36<06:08, 4.39s/it] Training 2/2 epoch (loss 0.0908): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 757/840 [47:36<05:47, 4.19s/it] Training 2/2 epoch (loss 0.1011): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 757/840 [47:39<05:47, 4.19s/it] Training 2/2 epoch (loss 0.1011): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 758/840 [47:39<05:17, 3.87s/it] Training 2/2 epoch (loss 0.0708): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 758/840 [47:43<05:17, 3.87s/it] Training 2/2 epoch (loss 0.0708): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 759/840 [47:43<05:00, 3.71s/it] Training 2/2 epoch (loss 0.0574): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 759/840 [47:47<05:00, 3.71s/it] Training 2/2 epoch (loss 0.0574): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 760/840 [47:47<05:04, 3.80s/it] Training 2/2 epoch (loss 0.0211): 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 760/840 [47:50<05:04, 3.80s/it] Training 2/2 epoch (loss 0.0211): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 761/840 [47:50<04:44, 3.61s/it] Training 2/2 epoch (loss 0.0322): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 761/840 [47:54<04:44, 3.61s/it] Training 2/2 epoch (loss 0.0322): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 762/840 [47:54<04:46, 3.67s/it] Training 2/2 epoch (loss 0.1357): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 762/840 [47:57<04:46, 3.67s/it] Training 2/2 epoch (loss 0.1357): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 763/840 [47:57<04:27, 3.47s/it] Training 2/2 epoch (loss 0.1289): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 763/840 [48:01<04:27, 3.47s/it] Training 2/2 epoch (loss 0.1289): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 764/840 [48:01<04:48, 3.79s/it] Training 2/2 epoch (loss 0.1123): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 764/840 [48:04<04:48, 3.79s/it] Training 2/2 epoch (loss 0.1123): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 765/840 [48:04<04:18, 3.44s/it] Training 2/2 epoch (loss 0.1367): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 765/840 [48:07<04:18, 3.44s/it] Training 2/2 epoch (loss 0.1367): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 766/840 [48:07<03:56, 3.20s/it] Training 2/2 epoch (loss 0.0620): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 766/840 [48:09<03:56, 3.20s/it] Training 2/2 epoch (loss 0.0620): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 767/840 [48:09<03:41, 3.04s/it] Training 2/2 epoch (loss 0.0105): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 767/840 [48:12<03:41, 3.04s/it] Training 2/2 epoch (loss 0.0105): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 768/840 [48:12<03:43, 3.11s/it] Training 2/2 epoch (loss 0.0264): 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 768/840 [48:16<03:43, 3.11s/it] Training 2/2 epoch (loss 0.0264): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 769/840 [48:16<03:59, 3.38s/it] Training 2/2 epoch (loss 0.0366): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 769/840 [48:19<03:59, 3.38s/it] Training 2/2 epoch (loss 0.0366): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 770/840 [48:19<03:43, 3.19s/it] Training 2/2 epoch (loss 0.0591): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 770/840 [48:25<03:43, 3.19s/it] Training 2/2 epoch (loss 0.0591): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 771/840 [48:25<04:26, 3.86s/it] Training 2/2 epoch (loss 0.1289): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 771/840 [48:28<04:26, 3.86s/it] Training 2/2 epoch (loss 0.1289): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 772/840 [48:28<04:06, 3.62s/it] Training 2/2 epoch (loss 0.1240): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 772/840 [48:31<04:06, 3.62s/it] Training 2/2 epoch (loss 0.1240): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 773/840 [48:31<04:02, 3.62s/it] Training 2/2 epoch (loss 0.0559): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 773/840 [48:35<04:02, 3.62s/it] Training 2/2 epoch (loss 0.0559): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 774/840 [48:35<03:57, 3.60s/it] Training 2/2 epoch (loss 0.0889): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 774/840 [48:38<03:57, 3.60s/it] Training 2/2 epoch (loss 0.0889): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 775/840 [48:38<03:45, 3.47s/it] Training 2/2 epoch (loss 0.0742): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 775/840 [48:42<03:45, 3.47s/it] Training 2/2 epoch (loss 0.0742): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 776/840 [48:42<03:52, 3.63s/it] Training 2/2 epoch (loss 0.0157): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 776/840 [48:47<03:52, 3.63s/it] Training 2/2 epoch (loss 0.0157): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 777/840 [48:47<04:12, 4.00s/it] Training 2/2 epoch (loss 0.1455): 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 777/840 [48:51<04:12, 4.00s/it] Training 2/2 epoch (loss 0.1455): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 778/840 [48:51<04:18, 4.17s/it] Training 2/2 epoch (loss 0.0698): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 778/840 [48:57<04:18, 4.17s/it] Training 2/2 epoch (loss 0.0698): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 779/840 [48:57<04:37, 4.55s/it] Training 2/2 epoch (loss 0.0591): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 779/840 [49:01<04:37, 4.55s/it] Training 2/2 epoch (loss 0.0591): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 780/840 [49:01<04:24, 4.40s/it] Training 2/2 epoch (loss 0.0077): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 780/840 [49:05<04:24, 4.40s/it] Training 2/2 epoch (loss 0.0077): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 781/840 [49:05<04:12, 4.28s/it] Training 2/2 epoch (loss 0.1226): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 781/840 [49:08<04:12, 4.28s/it] Training 2/2 epoch (loss 0.1226): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 782/840 [49:08<03:43, 3.85s/it] Training 2/2 epoch (loss 0.0391): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 782/840 [49:11<03:43, 3.85s/it] Training 2/2 epoch (loss 0.0391): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 783/840 [49:11<03:33, 3.75s/it] Training 2/2 epoch (loss 0.0635): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 783/840 [49:16<03:33, 3.75s/it] Training 2/2 epoch (loss 0.0635): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 784/840 [49:16<03:52, 4.15s/it] Training 2/2 epoch (loss 0.1543): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 784/840 [49:20<03:52, 4.15s/it] Training 2/2 epoch (loss 0.1543): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 785/840 [49:20<03:40, 4.00s/it] Training 2/2 epoch (loss 0.0698): 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 785/840 [49:26<03:40, 4.00s/it] Training 2/2 epoch (loss 0.0698): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 786/840 [49:26<04:00, 4.45s/it] Training 2/2 epoch (loss 0.0493): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 786/840 [49:29<04:00, 4.45s/it] Training 2/2 epoch (loss 0.0493): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 787/840 [49:29<03:41, 4.17s/it] Training 2/2 epoch (loss 0.0518): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 787/840 [49:34<03:41, 4.17s/it] Training 2/2 epoch (loss 0.0518): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 788/840 [49:34<03:43, 4.30s/it] Training 2/2 epoch (loss 0.2695): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 788/840 [49:37<03:43, 4.30s/it] Training 2/2 epoch (loss 0.2695): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 789/840 [49:37<03:21, 3.96s/it] Training 2/2 epoch (loss 0.4727): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 789/840 [49:42<03:21, 3.96s/it] Training 2/2 epoch (loss 0.4727): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 790/840 [49:42<03:41, 4.43s/it] Training 2/2 epoch (loss 0.0918): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 790/840 [49:48<03:41, 4.43s/it] Training 2/2 epoch (loss 0.0918): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 791/840 [49:48<03:52, 4.74s/it] Training 2/2 epoch (loss 0.0255): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 791/840 [49:53<03:52, 4.74s/it] Training 2/2 epoch (loss 0.0255): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 792/840 [49:53<03:58, 4.97s/it] Training 2/2 epoch (loss 0.0214): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 792/840 [49:57<03:58, 4.97s/it] Training 2/2 epoch (loss 0.0214): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 793/840 [49:57<03:31, 4.50s/it] Training 2/2 epoch (loss 0.0361): 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 793/840 [50:00<03:31, 4.50s/it] Training 2/2 epoch (loss 0.0361): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 794/840 [50:00<03:13, 4.22s/it] Training 2/2 epoch (loss 0.0684): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 794/840 [50:05<03:13, 4.22s/it] Training 2/2 epoch (loss 0.0684): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 795/840 [50:05<03:14, 4.31s/it] Training 2/2 epoch (loss 0.0306): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 795/840 [50:10<03:14, 4.31s/it] Training 2/2 epoch (loss 0.0306): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 796/840 [50:10<03:17, 4.49s/it] Training 2/2 epoch (loss 0.0079): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 796/840 [50:14<03:17, 4.49s/it] Training 2/2 epoch (loss 0.0079): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 797/840 [50:14<03:06, 4.33s/it] Training 2/2 epoch (loss 0.0544): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 797/840 [50:17<03:06, 4.33s/it] Training 2/2 epoch (loss 0.0544): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 798/840 [50:17<02:54, 4.16s/it] Training 2/2 epoch (loss 0.0205): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 798/840 [50:20<02:54, 4.16s/it] Training 2/2 epoch (loss 0.0205): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 799/840 [50:20<02:32, 3.72s/it] Training 2/2 epoch (loss 0.0732): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 799/840 [50:25<02:32, 3.72s/it] Training 2/2 epoch (loss 0.0732): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 800/840 [50:25<02:42, 4.05s/it] Training 2/2 epoch (loss 0.0366): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 800/840 [50:28<02:42, 4.05s/it] Training 2/2 epoch (loss 0.0366): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 801/840 [50:28<02:23, 3.67s/it] Training 2/2 epoch (loss 0.0310): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 801/840 [50:33<02:23, 3.67s/it] Training 2/2 epoch (loss 0.0310): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 802/840 [50:33<02:33, 4.05s/it] Training 2/2 epoch (loss 0.0332): 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 802/840 [50:36<02:33, 4.05s/it] Training 2/2 epoch (loss 0.0332): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 803/840 [50:36<02:20, 3.79s/it] Training 2/2 epoch (loss 0.2227): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 803/840 [50:39<02:20, 3.79s/it] Training 2/2 epoch (loss 0.2227): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 804/840 [50:39<02:14, 3.73s/it] Training 2/2 epoch (loss 0.0654): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 804/840 [50:43<02:14, 3.73s/it] Training 2/2 epoch (loss 0.0654): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 805/840 [50:43<02:03, 3.53s/it] Training 2/2 epoch (loss 0.0537): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 805/840 [50:46<02:03, 3.53s/it] Training 2/2 epoch (loss 0.0537): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 806/840 [50:46<01:54, 3.37s/it] Training 2/2 epoch (loss 0.0986): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 806/840 [50:48<01:54, 3.37s/it] Training 2/2 epoch (loss 0.0986): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 807/840 [50:48<01:45, 3.20s/it] Training 2/2 epoch (loss 0.1133): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 807/840 [50:53<01:45, 3.20s/it] Training 2/2 epoch (loss 0.1133): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 808/840 [50:53<01:52, 3.52s/it] Training 2/2 epoch (loss 0.0403): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 808/840 [50:55<01:52, 3.52s/it] Training 2/2 epoch (loss 0.0403): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 809/840 [50:55<01:42, 3.29s/it] Training 2/2 epoch (loss 0.0062): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 809/840 [50:58<01:42, 3.29s/it] Training 2/2 epoch (loss 0.0062): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 810/840 [50:58<01:36, 3.21s/it] Training 2/2 epoch (loss 0.0194): 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 810/840 [51:02<01:36, 3.21s/it] Training 2/2 epoch (loss 0.0194): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 811/840 [51:02<01:32, 3.19s/it] Training 2/2 epoch (loss 0.0359): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 811/840 [51:04<01:32, 3.19s/it] Training 2/2 epoch (loss 0.0359): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 812/840 [51:04<01:22, 2.96s/it] Training 2/2 epoch (loss 0.0425): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 812/840 [51:07<01:22, 2.96s/it] Training 2/2 epoch (loss 0.0425): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 813/840 [51:07<01:24, 3.12s/it] Training 2/2 epoch (loss 0.0284): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 813/840 [51:11<01:24, 3.12s/it] Training 2/2 epoch (loss 0.0284): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 814/840 [51:11<01:21, 3.12s/it] Training 2/2 epoch (loss 0.0483): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 814/840 [51:13<01:21, 3.12s/it] Training 2/2 epoch (loss 0.0483): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 815/840 [51:13<01:13, 2.96s/it] Training 2/2 epoch (loss 0.0396): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 815/840 [51:17<01:13, 2.96s/it] Training 2/2 epoch (loss 0.0396): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 816/840 [51:17<01:13, 3.08s/it] Training 2/2 epoch (loss 0.0383): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 816/840 [51:20<01:13, 3.08s/it] Training 2/2 epoch (loss 0.0383): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 817/840 [51:20<01:16, 3.33s/it] Training 2/2 epoch (loss 0.0718): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 817/840 [51:24<01:16, 3.33s/it] Training 2/2 epoch (loss 0.0718): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 818/840 [51:24<01:13, 3.35s/it] Training 2/2 epoch (loss 0.1543): 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 818/840 [51:29<01:13, 3.35s/it] Training 2/2 epoch (loss 0.1543): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 819/840 [51:29<01:23, 3.98s/it] Training 2/2 epoch (loss 0.0223): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 819/840 [51:35<01:23, 3.98s/it] Training 2/2 epoch (loss 0.0223): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 820/840 [51:35<01:29, 4.47s/it] Training 2/2 epoch (loss 0.0791): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 820/840 [51:39<01:29, 4.47s/it] Training 2/2 epoch (loss 0.0791): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 821/840 [51:39<01:22, 4.32s/it] Training 2/2 epoch (loss 0.0574): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 821/840 [51:44<01:22, 4.32s/it] Training 2/2 epoch (loss 0.0574): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 822/840 [51:44<01:24, 4.68s/it] Training 2/2 epoch (loss 0.0101): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 822/840 [51:50<01:24, 4.68s/it] Training 2/2 epoch (loss 0.0101): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 823/840 [51:50<01:23, 4.90s/it] Training 2/2 epoch (loss 0.0542): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 823/840 [51:53<01:23, 4.90s/it] Training 2/2 epoch (loss 0.0542): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 824/840 [51:53<01:09, 4.34s/it] Training 2/2 epoch (loss 0.0564): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 824/840 [51:55<01:09, 4.34s/it] Training 2/2 epoch (loss 0.0564): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 825/840 [51:55<00:56, 3.78s/it] Training 2/2 epoch (loss 0.1426): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 825/840 [52:00<00:56, 3.78s/it] Training 2/2 epoch (loss 0.1426): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 826/840 [52:00<00:56, 4.03s/it] Training 2/2 epoch (loss 0.0206): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 826/840 [52:05<00:56, 4.03s/it] Training 2/2 epoch (loss 0.0206): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 827/840 [52:05<00:54, 4.22s/it] Training 2/2 epoch (loss 0.1108): 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 827/840 [52:09<00:54, 4.22s/it] Training 2/2 epoch (loss 0.1108): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 828/840 [52:09<00:52, 4.39s/it] Training 2/2 epoch (loss 0.0576): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 828/840 [52:12<00:52, 4.39s/it] Training 2/2 epoch (loss 0.0576): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 829/840 [52:12<00:42, 3.86s/it] Training 2/2 epoch (loss 0.0596): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 829/840 [52:18<00:42, 3.86s/it] Training 2/2 epoch (loss 0.0596): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 830/840 [52:18<00:43, 4.37s/it] Training 2/2 epoch (loss 0.0610): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 830/840 [52:20<00:43, 4.37s/it] Training 2/2 epoch (loss 0.0610): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 831/840 [52:20<00:35, 3.93s/it] Training 2/2 epoch (loss 0.0079): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 831/840 [52:24<00:35, 3.93s/it] Training 2/2 epoch (loss 0.0079): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 832/840 [52:24<00:29, 3.73s/it] Training 2/2 epoch (loss 0.0393): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 832/840 [52:27<00:29, 3.73s/it] Training 2/2 epoch (loss 0.0393): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 833/840 [52:27<00:24, 3.53s/it] Training 2/2 epoch (loss 0.0503): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 833/840 [52:31<00:24, 3.53s/it] Training 2/2 epoch (loss 0.0503): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 834/840 [52:31<00:22, 3.78s/it] Training 2/2 epoch (loss 0.0415): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 834/840 [52:35<00:22, 3.78s/it] Training 2/2 epoch (loss 0.0415): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 835/840 [52:35<00:19, 3.91s/it] Training 2/2 epoch (loss 0.0461): 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 835/840 [52:40<00:19, 3.91s/it] Training 2/2 epoch (loss 0.0461): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 836/840 [52:40<00:16, 4.15s/it] Training 2/2 epoch (loss 0.0581): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 836/840 [52:43<00:16, 4.15s/it] Training 2/2 epoch (loss 0.0581): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 837/840 [52:43<00:11, 3.90s/it] Training 2/2 epoch (loss 0.0503): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 837/840 [52:47<00:11, 3.90s/it] Training 2/2 epoch (loss 0.0503): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 838/840 [52:47<00:07, 3.86s/it] Training 2/2 epoch (loss 0.1348): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 838/840 [52:50<00:07, 3.86s/it] Training 2/2 epoch (loss 0.1348): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 839/840 [52:50<00:03, 3.63s/it] Training 2/2 epoch (loss 0.0248): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 839/840 [52:54<00:03, 3.63s/it] Training 2/2 epoch (loss 0.0248): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 840/840 [52:54<00:00, 3.64s/it] Training 2/2 epoch (loss 0.0248): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 840/840 [52:54<00:00, 3.78s/it]
/data/jiongxiao_wang/anaconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/data/jiongxiao_wang/anaconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/data/jiongxiao_wang/anaconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/data/jiongxiao_wang/anaconda3/envs/safe-rlhf/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
wandb: - 0.015 MB of 0.015 MB uploaded wandb: \ 0.015 MB of 0.033 MB uploaded wandb: | 0.015 MB of 0.033 MB uploaded wandb: / 0.033 MB of 0.033 MB uploaded wandb:
wandb:
wandb: Run history:
wandb: train/accuracy β–‚β–β–„β–…β–ƒβ–ƒβ–„β–…β–…β–…β–…β–…β–„β–…β–„β–…β–„β–„β–…β–…β–…β–†β–…β–‡β–‡β–†β–‡β–‡β–ˆβ–‡β–‡β–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
wandb: train/epoch β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb: train/loss β–ˆβ–ˆβ–‡β–‡β–ˆβ–‡β–ˆβ–‡β–‡β–‡β–‡β–†β–†β–†β–‡β–…β–†β–†β–†β–†β–†β–„β–…β–ƒβ–‚β–ƒβ–ƒβ–‚β–β–ƒβ–‚β–β–‚β–β–‚β–‚β–‚β–β–β–
wandb: train/lr β–ƒβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–‡β–†β–†β–†β–†β–…β–…β–…β–…β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–
wandb: train/step β–β–β–β–‚β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb:
wandb: Run summary:
wandb: train/accuracy 0.98333
wandb: train/epoch 2.0
wandb: train/loss 0.02478
wandb: train/lr 0.0
wandb: train/step 840
wandb:
wandb: πŸš€ View run reward-2024-01-05-20-03-25 at: https://wandb.ai/jayfeather1024/Safe-RLHF-RM/runs/0bh9htd8
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./output/rm_30k/wandb/run-20240105_200327-0bh9htd8/logs