[2022-12-16 19:19:56,902] [WARNING] [runner.py:179:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2022-12-16 19:19:56,998] [INFO] [runner.py:508:main] cmd = /home/milan/hf_env/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 run_speech_recognition_seq2seq_streaming.py --deepspeed=ds_config.json --model_name_or_path=openai/whisper-large-v2 --dataset_name=mozilla-foundation/common_voice_11_0 --dataset_config_name=cs --language=czech --train_split_name=train+validation --eval_split_name=test --model_index_name=Whisper Large-v2 Czech CV11 v2 --max_steps=5000 --output_dir=./ --per_device_train_batch_size=32 --per_device_eval_batch_size=8 --gradient_accumulation_steps=2 --logging_steps=25 --learning_rate=1e-5 --warmup_steps=500 --evaluation_strategy=steps --eval_steps=1000 --save_strategy=steps --save_steps=1000 --generation_max_length=225 --length_column_name=input_length --max_duration_in_seconds=30 --text_column_name=sentence --freeze_feature_encoder=False --report_to=tensorboard --metric_for_best_model=wer --greater_is_better=False --load_best_model_at_end --gradient_checkpointing --fp16 --overwrite_output_dir --do_train --do_eval --predict_with_generate --do_normalize_eval --streaming=False --use_auth_token --push_to_hub [2022-12-16 19:19:58,537] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]} [2022-12-16 19:19:58,537] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0 [2022-12-16 19:19:58,537] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(, {'localhost': [0]}) [2022-12-16 19:19:58,537] [INFO] [launch.py:162:main] dist_world_size=1 [2022-12-16 19:19:58,537] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0 [2022-12-16 19:20:02,860] [INFO] [comm.py:654:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 12/16/2022 19:20:03 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True 12/16/2022 19:20:03 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=ds_config.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=1000, evaluation_strategy=steps, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_max_length=225, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=True, greater_is_better=False, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=input_length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=./runs/Dec16_19-20-02_129-146-123-136, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=25, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=5000, metric_for_best_model=wer, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, optim_args=None, output_dir=./, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=32, predict_with_generate=True, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./, save_on_each_node=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.0, xpu_backend=None, ) 12/16/2022 19:20:03 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=ds_config.json, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=1000, evaluation_strategy=steps, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_max_length=225, generation_num_beams=None, gradient_accumulation_steps=2, gradient_checkpointing=True, greater_is_better=False, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=input_length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=./runs/Dec16_19-20-02_129-146-123-136, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=25, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=5000, metric_for_best_model=wer, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_hf, optim_args=None, output_dir=./, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=32, predict_with_generate=True, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./, save_on_each_node=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=500, weight_decay=0.0, xpu_backend=None, ) 12/16/2022 19:20:05 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:05 - INFO - datasets.builder - Overwrite dataset info from restored data version. 12/16/2022 19:20:05 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:05 - WARNING - datasets.builder - Found cached dataset common_voice_11_0 (/home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f) 12/16/2022 19:20:05 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:06 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:06 - INFO - datasets.builder - Overwrite dataset info from restored data version. 12/16/2022 19:20:06 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:06 - WARNING - datasets.builder - Found cached dataset common_voice_11_0 (/home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f) 12/16/2022 19:20:06 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:08 - INFO - datasets.info - Loading Dataset Infos from /home/milan/.cache/huggingface/modules/datasets_modules/datasets/mozilla-foundation--common_voice_11_0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:08 - INFO - datasets.builder - Overwrite dataset info from restored data version. 12/16/2022 19:20:08 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:08 - WARNING - datasets.builder - Found cached dataset common_voice_11_0 (/home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f) 12/16/2022 19:20:08 - INFO - datasets.info - Loading Dataset info from /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f 12/16/2022 19:20:27 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-3d5c448b6a2bf0f7.arrow 12/16/2022 19:20:29 - INFO - datasets.arrow_dataset - Caching processed dataset at /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-73e3a5936553e76c.arrow 12/16/2022 19:40:11 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/milan/.cache/huggingface/datasets/mozilla-foundation___common_voice_11_0/cs/11.0.0/f8e47235d9b4e68fa24ed71d63266a02018ccf7194b2a8c9c598a5f3ab304d9f/cache-3470671e4cfe112f.arrow 12/16/2022 19:40:13 - WARNING - huggingface_hub.repository - /home/milan/whisper-large2-czech-cv11-v2/./ is already a clone of https://huggingface.co/mikr/whisper-large2-czech-cv11-v2. Make sure you pull the latest changes with `repo.git_pull()`. [2022-12-16 19:40:17,786] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown [2022-12-16 19:40:18,780] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2022-12-16 19:40:19,982] [WARNING] [cpu_adam.py:83:__init__] FP16 params for CPUAdam may not work on AMD CPUs Installed CUDA version 11.6 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination ninja: no work to do. Time to load cpu_adam op: 3.031318426132202 seconds Adam Optimizer #0 is created with AVX2 arithmetic capability. Config: alpha=0.000010, betas=(0.900000, 0.999000), weight_decay=0.000000, adam_w=1 [2022-12-16 19:40:24,909] [INFO] [logging.py:68:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer [2022-12-16 19:40:25,211] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam [2022-12-16 19:40:25,212] [INFO] [utils.py:52:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type= [2022-12-16 19:40:25,212] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer [2022-12-16 19:40:25,212] [INFO] [stage_1_and_2.py:140:__init__] Reduce bucket size 200000000 [2022-12-16 19:40:25,212] [INFO] [stage_1_and_2.py:141:__init__] Allgather bucket size 200000000 [2022-12-16 19:40:25,212] [INFO] [stage_1_and_2.py:142:__init__] CPU Offload: True [2022-12-16 19:40:25,212] [INFO] [stage_1_and_2.py:143:__init__] Round robin gradient partitioning: False ninja: no work to do. Time to load utils op: 0.5200150012969971 seconds Rank: 0 partition count [1] and sizes[(1543304960, False)] [2022-12-16 19:40:29,582] [INFO] [utils.py:827:see_memory_usage] Before initializing optimizer states [2022-12-16 19:40:29,583] [INFO] [utils.py:828:see_memory_usage] MA 3.0 GB Max_MA 3.0 GB CA 5.99 GB Max_CA 6 GB [2022-12-16 19:40:29,583] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 15.46 GB, percent = 7.9% [2022-12-16 19:40:33,634] [INFO] [utils.py:827:see_memory_usage] After initializing optimizer states [2022-12-16 19:40:33,634] [INFO] [utils.py:828:see_memory_usage] MA 3.0 GB Max_MA 3.0 GB CA 5.99 GB Max_CA 6 GB [2022-12-16 19:40:33,635] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 35.2 GB, percent = 17.9% [2022-12-16 19:40:33,635] [INFO] [stage_1_and_2.py:525:__init__] optimizer state initialized [2022-12-16 19:40:33,721] [INFO] [utils.py:827:see_memory_usage] After initializing ZeRO optimizer [2022-12-16 19:40:33,722] [INFO] [utils.py:828:see_memory_usage] MA 3.0 GB Max_MA 3.0 GB CA 5.99 GB Max_CA 6 GB [2022-12-16 19:40:33,723] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 35.13 GB, percent = 17.9% [2022-12-16 19:40:33,756] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = adamw [2022-12-16 19:40:33,756] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupDecayLR [2022-12-16 19:40:33,757] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2022-12-16 19:40:33,757] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-05], mom=[[0.9, 0.999]] [2022-12-16 19:40:33,759] [INFO] [config.py:1020:print] DeepSpeedEngine configuration: [2022-12-16 19:40:33,759] [INFO] [config.py:1024:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2022-12-16 19:40:33,759] [INFO] [config.py:1024:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2022-12-16 19:40:33,759] [INFO] [config.py:1024:print] amp_enabled .................. False [2022-12-16 19:40:33,759] [INFO] [config.py:1024:print] amp_params ................... False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] bfloat16_enabled ............. False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] checkpoint_parallel_write_pipeline False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] checkpoint_tag_validation_enabled True [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] checkpoint_tag_validation_fail False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] comms_config ................. [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] communication_data_type ...... None [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] curriculum_enabled ........... False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] curriculum_params ............ False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] dataloader_drop_last ......... False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] disable_allgather ............ False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] dump_state ................... False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 1000, 'delayed_shift': 2, 'min_scale': 1} [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_enabled ........... False [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_gas_boundary_resolution 1 [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_layer_name ........ bert.encoder.layer [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_layer_num ......... 0 [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_max_iter .......... 100 [2022-12-16 19:40:33,760] [INFO] [config.py:1024:print] eigenvalue_stability ......... 1e-06 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] eigenvalue_tol ............... 0.01 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] eigenvalue_verbose ........... False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] elasticity_enabled ........... False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] fp16_auto_cast ............... False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] fp16_enabled ................. True [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] fp16_master_weights_and_gradients False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] global_rank .................. 0 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] grad_accum_dtype ............. None [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] gradient_accumulation_steps .. 2 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] gradient_clipping ............ 1.0 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] gradient_predivide_factor .... 1.0 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] initial_dynamic_scale ........ 65536 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] load_universal_checkpoint .... False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] loss_scale ................... 0 [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] memory_breakdown ............. False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] monitor_config ............... [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] optimizer_legacy_fusion ...... False [2022-12-16 19:40:33,761] [INFO] [config.py:1024:print] optimizer_name ............... adamw [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] optimizer_params ............. {'lr': 1e-05, 'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0.0} [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] pld_enabled .................. False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] pld_params ................... False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] prescale_gradients ........... False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] scheduler_name ............... WarmupDecayLR [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] scheduler_params ............. {'last_batch_iteration': -1, 'total_num_steps': 5000, 'warmup_min_lr': 0, 'warmup_max_lr': 1e-05, 'warmup_num_steps': 500} [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] sparse_attention ............. None [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] sparse_gradients_enabled ..... False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] steps_per_print .............. 10 [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] train_batch_size ............. 64 [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] train_micro_batch_size_per_gpu 32 [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] use_node_local_storage ....... False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] wall_clock_breakdown ......... False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] world_size ................... 1 [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] zero_allow_untested_optimizer False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] zero_enabled ................. True [2022-12-16 19:40:33,762] [INFO] [config.py:1024:print] zero_optimization_stage ...... 2 [2022-12-16 19:40:33,763] [INFO] [config.py:1009:print_user_config] json = { "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "optimizer": { "type": "AdamW", "params": { "lr": 1e-05, "betas": [0.9, 0.999], "eps": 1e-08, "weight_decay": 0.0 } }, "scheduler": { "type": "WarmupDecayLR", "params": { "last_batch_iteration": -1, "total_num_steps": 5.000000e+03, "warmup_min_lr": 0, "warmup_max_lr": 1e-05, "warmup_num_steps": 500 } }, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "allgather_partitions": true, "allgather_bucket_size": 2.000000e+08, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 2.000000e+08, "contiguous_gradients": true }, "gradient_accumulation_steps": 2, "gradient_clipping": 1.0, "train_batch_size": 64, "train_micro_batch_size_per_gpu": 32 } Time to load utils op: 0.0003948211669921875 seconds [2022-12-16 19:40:58,606] [INFO] [timer.py:197:stop] 0/4, RunningAvgSamplesPerSec=6.327062880977527, CurrSamplesPerSec=5.683973434872449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:41:09,925] [INFO] [timer.py:197:stop] 0/6, RunningAvgSamplesPerSec=6.337890979134199, CurrSamplesPerSec=5.698936745189652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:41:21,308] [INFO] [timer.py:197:stop] 0/8, RunningAvgSamplesPerSec=6.3294469227923305, CurrSamplesPerSec=5.6523551541575205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:41:33,017] [INFO] [timer.py:197:stop] 0/10, RunningAvgSamplesPerSec=6.328546175322321, CurrSamplesPerSec=5.701343759212486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:41:44,654] [INFO] [timer.py:197:stop] 0/12, RunningAvgSamplesPerSec=6.330046764141762, CurrSamplesPerSec=5.7140661343466865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:41:56,029] [INFO] [timer.py:197:stop] 0/14, RunningAvgSamplesPerSec=6.327367592679242, CurrSamplesPerSec=5.687009205382302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:42:07,620] [INFO] [timer.py:197:stop] 0/16, RunningAvgSamplesPerSec=6.324036355417439, CurrSamplesPerSec=5.67106537300076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:42:19,208] [INFO] [timer.py:197:stop] 0/18, RunningAvgSamplesPerSec=6.324517029843766, CurrSamplesPerSec=5.686187866744037, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:42:30,000] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 65536 [2022-12-16 19:42:30,002] [INFO] [logging.py:68:log_dist] [Rank 0] step=10, skipped=1, lr=[3.535580269163017e-06], mom=[[0.9, 0.999]] [2022-12-16 19:42:30,003] [INFO] [timer.py:197:stop] 0/20, RunningAvgSamplesPerSec=6.364053760974696, CurrSamplesPerSec=6.352128353972973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:42:41,491] [INFO] [timer.py:197:stop] 0/22, RunningAvgSamplesPerSec=6.359150016371227, CurrSamplesPerSec=5.681020735272481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:42:53,045] [INFO] [timer.py:197:stop] 0/24, RunningAvgSamplesPerSec=6.356683117345163, CurrSamplesPerSec=5.686370954837155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:43:03,984] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768.0 [2022-12-16 19:43:03,986] [INFO] [timer.py:197:stop] 0/26, RunningAvgSamplesPerSec=6.376183713003614, CurrSamplesPerSec=6.175548481452842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:43:15,357] [INFO] [timer.py:197:stop] 0/28, RunningAvgSamplesPerSec=6.370924228753169, CurrSamplesPerSec=5.667562406103809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:43:26,742] [INFO] [timer.py:197:stop] 0/30, RunningAvgSamplesPerSec=6.366664670767918, CurrSamplesPerSec=5.69276426047378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:43:38,322] [INFO] [timer.py:197:stop] 0/32, RunningAvgSamplesPerSec=6.3545668974505904, CurrSamplesPerSec=5.480904775405896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:43:49,702] [INFO] [timer.py:197:stop] 0/34, RunningAvgSamplesPerSec=6.350895118978619, CurrSamplesPerSec=5.64039048535495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:01,055] [INFO] [timer.py:197:stop] 0/36, RunningAvgSamplesPerSec=6.34923010740825, CurrSamplesPerSec=5.697280637929649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:12,507] [INFO] [timer.py:197:stop] 0/38, RunningAvgSamplesPerSec=6.344325528095929, CurrSamplesPerSec=5.606869186879086, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:23,841] [INFO] [logging.py:68:log_dist] [Rank 0] step=20, skipped=2, lr=[4.650931663140581e-06], mom=[[0.9, 0.999]] [2022-12-16 19:44:23,843] [INFO] [timer.py:197:stop] 0/40, RunningAvgSamplesPerSec=6.343791060705328, CurrSamplesPerSec=5.697582226434801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:35,251] [INFO] [timer.py:197:stop] 0/42, RunningAvgSamplesPerSec=6.341041313223634, CurrSamplesPerSec=5.669263060201902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:45,976] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0 [2022-12-16 19:44:45,978] [INFO] [timer.py:197:stop] 0/44, RunningAvgSamplesPerSec=6.358073639041419, CurrSamplesPerSec=6.36476204743189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:44:57,311] [INFO] [timer.py:197:stop] 0/46, RunningAvgSamplesPerSec=6.357102766194718, CurrSamplesPerSec=5.694208770316031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:45:08,706] [INFO] [timer.py:197:stop] 0/48, RunningAvgSamplesPerSec=6.354161775690528, CurrSamplesPerSec=5.654900942339342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:45:20,362] [INFO] [timer.py:197:stop] 0/50, RunningAvgSamplesPerSec=6.35051370610784, CurrSamplesPerSec=5.6561494303952085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.3246, 'learning_rate': 4.973833272194737e-06, 'epoch': 0.11} [2022-12-16 19:45:31,689] [INFO] [timer.py:197:stop] 0/52, RunningAvgSamplesPerSec=6.349782726929668, CurrSamplesPerSec=5.685572439917749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:45:43,248] [INFO] [timer.py:197:stop] 0/54, RunningAvgSamplesPerSec=6.34880004266226, CurrSamplesPerSec=5.6963738880964865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:45:54,899] [INFO] [timer.py:197:stop] 0/56, RunningAvgSamplesPerSec=6.345344332473558, CurrSamplesPerSec=5.643884141232875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:46:06,245] [INFO] [timer.py:197:stop] 0/58, RunningAvgSamplesPerSec=6.3448005032861134, CurrSamplesPerSec=5.690518647702642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:46:17,576] [INFO] [logging.py:68:log_dist] [Rank 0] step=30, skipped=3, lr=[5.303370403744525e-06], mom=[[0.9, 0.999]] [2022-12-16 19:46:17,578] [INFO] [timer.py:197:stop] 0/60, RunningAvgSamplesPerSec=6.344274046935118, CurrSamplesPerSec=5.7001045334137865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:46:29,195] [INFO] [timer.py:197:stop] 0/62, RunningAvgSamplesPerSec=6.342096151339661, CurrSamplesPerSec=5.669778676768185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:46:40,581] [INFO] [timer.py:197:stop] 0/64, RunningAvgSamplesPerSec=6.340969890059656, CurrSamplesPerSec=5.679663422407637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:46:51,901] [INFO] [timer.py:197:stop] 0/66, RunningAvgSamplesPerSec=6.340862832858599, CurrSamplesPerSec=5.708144139586262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:47:03,352] [INFO] [timer.py:197:stop] 0/68, RunningAvgSamplesPerSec=6.3395299478138005, CurrSamplesPerSec=5.699751121093182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:47:14,725] [INFO] [timer.py:197:stop] 0/70, RunningAvgSamplesPerSec=6.338858901572622, CurrSamplesPerSec=5.6697978375746505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:47:26,090] [INFO] [timer.py:197:stop] 0/72, RunningAvgSamplesPerSec=6.338275884730358, CurrSamplesPerSec=5.6821056575417135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:47:37,502] [INFO] [timer.py:197:stop] 0/74, RunningAvgSamplesPerSec=6.336656821130215, CurrSamplesPerSec=5.693186355412951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:47:48,841] [INFO] [timer.py:197:stop] 0/76, RunningAvgSamplesPerSec=6.336628677865746, CurrSamplesPerSec=5.704229867270073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:00,193] [INFO] [timer.py:197:stop] 0/78, RunningAvgSamplesPerSec=6.336391619354949, CurrSamplesPerSec=5.6949493023399285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:11,884] [INFO] [logging.py:68:log_dist] [Rank 0] step=40, skipped=3, lr=[5.810371073215365e-06], mom=[[0.9, 0.999]] [2022-12-16 19:48:11,886] [INFO] [timer.py:197:stop] 0/80, RunningAvgSamplesPerSec=6.336877312300462, CurrSamplesPerSec=5.711746323017021, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:23,476] [INFO] [timer.py:197:stop] 0/82, RunningAvgSamplesPerSec=6.336437280416101, CurrSamplesPerSec=5.700073789642822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:34,945] [INFO] [timer.py:197:stop] 0/84, RunningAvgSamplesPerSec=6.334408253614145, CurrSamplesPerSec=5.564071227950201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:46,254] [INFO] [timer.py:197:stop] 0/86, RunningAvgSamplesPerSec=6.33464567786091, CurrSamplesPerSec=5.702617444594408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:48:57,589] [INFO] [timer.py:197:stop] 0/88, RunningAvgSamplesPerSec=6.334718218022051, CurrSamplesPerSec=5.6997610450876905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:49:09,242] [INFO] [timer.py:197:stop] 0/90, RunningAvgSamplesPerSec=6.330253477273904, CurrSamplesPerSec=5.385235114108773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:49:20,553] [INFO] [timer.py:197:stop] 0/92, RunningAvgSamplesPerSec=6.330733283816013, CurrSamplesPerSec=5.722714157612833, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:49:31,916] [INFO] [timer.py:197:stop] 0/94, RunningAvgSamplesPerSec=6.330690353473429, CurrSamplesPerSec=5.702511080508184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:49:43,602] [INFO] [timer.py:197:stop] 0/96, RunningAvgSamplesPerSec=6.330121487925524, CurrSamplesPerSec=5.651281561996539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:49:54,961] [INFO] [timer.py:197:stop] 0/98, RunningAvgSamplesPerSec=6.3297297307440985, CurrSamplesPerSec=5.654540248857908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:50:06,563] [INFO] [logging.py:68:log_dist] [Rank 0] step=50, skipped=3, lr=[6.195318418690893e-06], mom=[[0.9, 0.999]] [2022-12-16 19:50:06,565] [INFO] [timer.py:197:stop] 0/100, RunningAvgSamplesPerSec=6.329719324883379, CurrSamplesPerSec=5.702907240231407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1691, 'learning_rate': 6.195318418690893e-06, 'epoch': 0.21} [2022-12-16 19:50:18,089] [INFO] [timer.py:197:stop] 0/102, RunningAvgSamplesPerSec=6.328697909114363, CurrSamplesPerSec=5.685371341049817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:50:29,422] [INFO] [timer.py:197:stop] 0/104, RunningAvgSamplesPerSec=6.329127259216088, CurrSamplesPerSec=5.715023301398578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:50:40,909] [INFO] [timer.py:197:stop] 0/106, RunningAvgSamplesPerSec=6.329489892401154, CurrSamplesPerSec=5.732910611937274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:50:52,391] [INFO] [timer.py:197:stop] 0/108, RunningAvgSamplesPerSec=6.3291082887675385, CurrSamplesPerSec=5.713503516490646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:51:03,672] [INFO] [timer.py:197:stop] 0/110, RunningAvgSamplesPerSec=6.329714179875415, CurrSamplesPerSec=5.717954946591897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:51:15,047] [INFO] [timer.py:197:stop] 0/112, RunningAvgSamplesPerSec=6.330206129328904, CurrSamplesPerSec=5.731479198460716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:51:26,433] [INFO] [timer.py:197:stop] 0/114, RunningAvgSamplesPerSec=6.330185801461701, CurrSamplesPerSec=5.697561668064762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:51:37,747] [INFO] [timer.py:197:stop] 0/116, RunningAvgSamplesPerSec=6.3305736060670466, CurrSamplesPerSec=5.721073711370483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:51:49,305] [INFO] [timer.py:197:stop] 0/118, RunningAvgSamplesPerSec=6.330816278593838, CurrSamplesPerSec=5.722915222713305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:00,804] [INFO] [logging.py:68:log_dist] [Rank 0] step=60, skipped=3, lr=[6.505722008216461e-06], mom=[[0.9, 0.999]] [2022-12-16 19:52:00,806] [INFO] [timer.py:197:stop] 0/120, RunningAvgSamplesPerSec=6.330232978736771, CurrSamplesPerSec=5.6565146202221674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:12,174] [INFO] [timer.py:197:stop] 0/122, RunningAvgSamplesPerSec=6.330056877233613, CurrSamplesPerSec=5.686394323508017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:23,596] [INFO] [timer.py:197:stop] 0/124, RunningAvgSamplesPerSec=6.330396119925766, CurrSamplesPerSec=5.719701094321522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:35,126] [INFO] [timer.py:197:stop] 0/126, RunningAvgSamplesPerSec=6.330297511420275, CurrSamplesPerSec=5.711849385744503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:46,490] [INFO] [timer.py:197:stop] 0/128, RunningAvgSamplesPerSec=6.330124817932588, CurrSamplesPerSec=5.687188249668486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:52:57,974] [INFO] [timer.py:197:stop] 0/130, RunningAvgSamplesPerSec=6.33039768866297, CurrSamplesPerSec=5.708138070552879, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:53:09,484] [INFO] [timer.py:197:stop] 0/132, RunningAvgSamplesPerSec=6.330326619642486, CurrSamplesPerSec=5.711341157262751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:53:20,796] [INFO] [timer.py:197:stop] 0/134, RunningAvgSamplesPerSec=6.3306785190922925, CurrSamplesPerSec=5.725297601502475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:53:32,117] [INFO] [timer.py:197:stop] 0/136, RunningAvgSamplesPerSec=6.330777395173404, CurrSamplesPerSec=5.700345895641132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:53:43,732] [INFO] [timer.py:197:stop] 0/138, RunningAvgSamplesPerSec=6.328317929913682, CurrSamplesPerSec=5.704530009876204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:53:55,069] [INFO] [logging.py:68:log_dist] [Rank 0] step=70, skipped=3, lr=[6.765821034569313e-06], mom=[[0.9, 0.999]] [2022-12-16 19:53:55,070] [INFO] [timer.py:197:stop] 0/140, RunningAvgSamplesPerSec=6.3281437768124755, CurrSamplesPerSec=5.6646511134992545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:54:06,392] [INFO] [timer.py:197:stop] 0/142, RunningAvgSamplesPerSec=6.328266137695921, CurrSamplesPerSec=5.723089458024658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:54:17,774] [INFO] [timer.py:197:stop] 0/144, RunningAvgSamplesPerSec=6.327982292630433, CurrSamplesPerSec=5.701898171177177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:54:29,098] [INFO] [timer.py:197:stop] 0/146, RunningAvgSamplesPerSec=6.3282000633196125, CurrSamplesPerSec=5.7053862444388335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:54:40,438] [INFO] [timer.py:197:stop] 0/148, RunningAvgSamplesPerSec=6.328155482147814, CurrSamplesPerSec=5.673894035889023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:54:52,053] [INFO] [timer.py:197:stop] 0/150, RunningAvgSamplesPerSec=6.328496355103054, CurrSamplesPerSec=5.7220768941292555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1644, 'learning_rate': 6.881634451095711e-06, 'epoch': 0.32} [2022-12-16 19:55:03,365] [INFO] [timer.py:197:stop] 0/152, RunningAvgSamplesPerSec=6.328569418735792, CurrSamplesPerSec=5.68189566352516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:55:14,855] [INFO] [timer.py:197:stop] 0/154, RunningAvgSamplesPerSec=6.3275040965459715, CurrSamplesPerSec=5.571994189949849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:55:26,452] [INFO] [timer.py:197:stop] 0/156, RunningAvgSamplesPerSec=6.327817724665443, CurrSamplesPerSec=5.721640992027081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:55:37,782] [INFO] [timer.py:197:stop] 0/158, RunningAvgSamplesPerSec=6.327978159826033, CurrSamplesPerSec=5.712968001389319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:55:49,157] [INFO] [logging.py:68:log_dist] [Rank 0] step=80, skipped=3, lr=[6.9896691039239e-06], mom=[[0.9, 0.999]] [2022-12-16 19:55:49,159] [INFO] [timer.py:197:stop] 0/160, RunningAvgSamplesPerSec=6.328182428410735, CurrSamplesPerSec=5.717271739216092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:00,611] [INFO] [timer.py:197:stop] 0/162, RunningAvgSamplesPerSec=6.327661285455147, CurrSamplesPerSec=5.696292899154895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:11,980] [INFO] [timer.py:197:stop] 0/164, RunningAvgSamplesPerSec=6.327531151891573, CurrSamplesPerSec=5.687551433336378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:23,313] [INFO] [timer.py:197:stop] 0/166, RunningAvgSamplesPerSec=6.327629144751145, CurrSamplesPerSec=5.690251580192909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:34,645] [INFO] [timer.py:197:stop] 0/168, RunningAvgSamplesPerSec=6.327745174849396, CurrSamplesPerSec=5.682591613865132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:45,962] [INFO] [timer.py:197:stop] 0/170, RunningAvgSamplesPerSec=6.3278631523075575, CurrSamplesPerSec=5.70634268898286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:56:57,300] [INFO] [timer.py:197:stop] 0/172, RunningAvgSamplesPerSec=6.327832359726225, CurrSamplesPerSec=5.692901410316983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:57:08,611] [INFO] [timer.py:197:stop] 0/174, RunningAvgSamplesPerSec=6.327975893412131, CurrSamplesPerSec=5.691621682502721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:57:19,938] [INFO] [timer.py:197:stop] 0/176, RunningAvgSamplesPerSec=6.328006607076309, CurrSamplesPerSec=5.674112793988129, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:57:31,335] [INFO] [timer.py:197:stop] 0/178, RunningAvgSamplesPerSec=6.327706582184348, CurrSamplesPerSec=5.648043028531768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:57:42,679] [INFO] [logging.py:68:log_dist] [Rank 0] step=90, skipped=3, lr=[7.186146009413563e-06], mom=[[0.9, 0.999]] [2022-12-16 19:57:42,680] [INFO] [timer.py:197:stop] 0/180, RunningAvgSamplesPerSec=6.327737545730115, CurrSamplesPerSec=5.679434382740355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:57:54,053] [INFO] [timer.py:197:stop] 0/182, RunningAvgSamplesPerSec=6.327567772018162, CurrSamplesPerSec=5.671728233835681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:58:05,553] [INFO] [timer.py:197:stop] 0/184, RunningAvgSamplesPerSec=6.327321171217064, CurrSamplesPerSec=5.650057103503205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:58:17,043] [INFO] [timer.py:197:stop] 0/186, RunningAvgSamplesPerSec=6.327257340860697, CurrSamplesPerSec=5.696403866676768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:58:28,641] [INFO] [timer.py:197:stop] 0/188, RunningAvgSamplesPerSec=6.325485038741021, CurrSamplesPerSec=5.448433209976314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:58:40,184] [INFO] [timer.py:197:stop] 0/190, RunningAvgSamplesPerSec=6.325371332058678, CurrSamplesPerSec=5.650139399406368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:58:51,535] [INFO] [timer.py:197:stop] 0/192, RunningAvgSamplesPerSec=6.32535828915325, CurrSamplesPerSec=5.693068268772343, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:59:03,080] [INFO] [timer.py:197:stop] 0/194, RunningAvgSamplesPerSec=6.324000040671213, CurrSamplesPerSec=5.484969185042463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:59:14,449] [INFO] [timer.py:197:stop] 0/196, RunningAvgSamplesPerSec=6.323911524635286, CurrSamplesPerSec=5.656116060308147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:59:25,865] [INFO] [timer.py:197:stop] 0/198, RunningAvgSamplesPerSec=6.32353250731071, CurrSamplesPerSec=5.666084499205607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 19:59:37,404] [INFO] [logging.py:68:log_dist] [Rank 0] step=100, skipped=3, lr=[7.361221988663844e-06], mom=[[0.9, 0.999]] [2022-12-16 19:59:37,406] [INFO] [timer.py:197:stop] 0/200, RunningAvgSamplesPerSec=6.3223735336504, CurrSamplesPerSec=5.500500901581866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1458, 'learning_rate': 7.361221988663844e-06, 'epoch': 0.42} [2022-12-16 19:59:48,750] [INFO] [timer.py:197:stop] 0/202, RunningAvgSamplesPerSec=6.322279298198284, CurrSamplesPerSec=5.673023008677945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:00,166] [INFO] [timer.py:197:stop] 0/204, RunningAvgSamplesPerSec=6.322269734290734, CurrSamplesPerSec=5.667173773723511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:11,649] [INFO] [timer.py:197:stop] 0/206, RunningAvgSamplesPerSec=6.321527490293199, CurrSamplesPerSec=5.603557346378851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:23,040] [INFO] [timer.py:197:stop] 0/208, RunningAvgSamplesPerSec=6.321413257018045, CurrSamplesPerSec=5.6625088443919855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:34,546] [INFO] [timer.py:197:stop] 0/210, RunningAvgSamplesPerSec=6.32141400676977, CurrSamplesPerSec=5.6849306591484146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:46,129] [INFO] [timer.py:197:stop] 0/212, RunningAvgSamplesPerSec=6.321367640712001, CurrSamplesPerSec=5.6768530161234745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:00:57,500] [INFO] [timer.py:197:stop] 0/214, RunningAvgSamplesPerSec=6.321339276033958, CurrSamplesPerSec=5.702498239554356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:01:08,843] [INFO] [timer.py:197:stop] 0/216, RunningAvgSamplesPerSec=6.321463085232523, CurrSamplesPerSec=5.712604482269962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:01:20,279] [INFO] [timer.py:197:stop] 0/218, RunningAvgSamplesPerSec=6.321074042641444, CurrSamplesPerSec=5.642485922448768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:01:31,627] [INFO] [logging.py:68:log_dist] [Rank 0] step=110, skipped=3, lr=[7.5191046007362515e-06], mom=[[0.9, 0.999]] [2022-12-16 20:01:31,629] [INFO] [timer.py:197:stop] 0/220, RunningAvgSamplesPerSec=6.321164389806482, CurrSamplesPerSec=5.698712681387359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:01:43,012] [INFO] [timer.py:197:stop] 0/222, RunningAvgSamplesPerSec=6.321083383820289, CurrSamplesPerSec=5.675591772493804, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:01:54,342] [INFO] [timer.py:197:stop] 0/224, RunningAvgSamplesPerSec=6.321181478696793, CurrSamplesPerSec=5.701490768394447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:02:05,720] [INFO] [timer.py:197:stop] 0/226, RunningAvgSamplesPerSec=6.321103908583707, CurrSamplesPerSec=5.6887073263966546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:02:17,052] [INFO] [timer.py:197:stop] 0/228, RunningAvgSamplesPerSec=6.321279121862528, CurrSamplesPerSec=5.7162519815248904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:02:28,413] [INFO] [timer.py:197:stop] 0/230, RunningAvgSamplesPerSec=6.321298670377185, CurrSamplesPerSec=5.709214433335775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:02:39,723] [INFO] [timer.py:197:stop] 0/232, RunningAvgSamplesPerSec=6.321502115012851, CurrSamplesPerSec=5.7279660020557355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:02:51,084] [INFO] [timer.py:197:stop] 0/234, RunningAvgSamplesPerSec=6.321540937522988, CurrSamplesPerSec=5.69602673978007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:02,438] [INFO] [timer.py:197:stop] 0/236, RunningAvgSamplesPerSec=6.32159137687095, CurrSamplesPerSec=5.691098465896081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:13,796] [INFO] [timer.py:197:stop] 0/238, RunningAvgSamplesPerSec=6.321628403692657, CurrSamplesPerSec=5.69404691784024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:25,137] [INFO] [logging.py:68:log_dist] [Rank 0] step=120, skipped=3, lr=[7.662870867121632e-06], mom=[[0.9, 0.999]] [2022-12-16 20:03:25,138] [INFO] [timer.py:197:stop] 0/240, RunningAvgSamplesPerSec=6.321748647888827, CurrSamplesPerSec=5.696099018567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:36,598] [INFO] [timer.py:197:stop] 0/242, RunningAvgSamplesPerSec=6.321880011749018, CurrSamplesPerSec=5.714113571526187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:47,936] [INFO] [timer.py:197:stop] 0/244, RunningAvgSamplesPerSec=6.322009363444331, CurrSamplesPerSec=5.6904749791235485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:03:59,258] [INFO] [timer.py:197:stop] 0/246, RunningAvgSamplesPerSec=6.322221643460198, CurrSamplesPerSec=5.7263758040612815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:04:10,613] [INFO] [timer.py:197:stop] 0/248, RunningAvgSamplesPerSec=6.322263624452999, CurrSamplesPerSec=5.692347539349729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:04:21,955] [INFO] [timer.py:197:stop] 0/250, RunningAvgSamplesPerSec=6.3222029662188115, CurrSamplesPerSec=5.67385973653264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1389, 'learning_rate': 7.730207550743121e-06, 'epoch': 0.53} [2022-12-16 20:04:33,296] [INFO] [timer.py:197:stop] 0/252, RunningAvgSamplesPerSec=6.322313955883044, CurrSamplesPerSec=5.701953642527591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:04:44,678] [INFO] [timer.py:197:stop] 0/254, RunningAvgSamplesPerSec=6.322151144538406, CurrSamplesPerSec=5.665580077082235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:04:56,010] [INFO] [timer.py:197:stop] 0/256, RunningAvgSamplesPerSec=6.322301777645738, CurrSamplesPerSec=5.718121328046148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:05:07,595] [INFO] [timer.py:197:stop] 0/258, RunningAvgSamplesPerSec=6.322330489705541, CurrSamplesPerSec=5.707673706391819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:05:18,934] [INFO] [logging.py:68:log_dist] [Rank 0] step=130, skipped=3, lr=[7.794839207460995e-06], mom=[[0.9, 0.999]] [2022-12-16 20:05:18,935] [INFO] [timer.py:197:stop] 0/260, RunningAvgSamplesPerSec=6.322448219599348, CurrSamplesPerSec=5.706156857008184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:05:30,262] [INFO] [timer.py:197:stop] 0/262, RunningAvgSamplesPerSec=6.322617585015774, CurrSamplesPerSec=5.711549930757668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:05:41,626] [INFO] [timer.py:197:stop] 0/264, RunningAvgSamplesPerSec=6.322613341576608, CurrSamplesPerSec=5.687131619368439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:05:53,023] [INFO] [timer.py:197:stop] 0/266, RunningAvgSamplesPerSec=6.322451874197788, CurrSamplesPerSec=5.676991081195374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:06:04,411] [INFO] [timer.py:197:stop] 0/268, RunningAvgSamplesPerSec=6.322351880434625, CurrSamplesPerSec=5.67797814191006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:06:15,738] [INFO] [timer.py:197:stop] 0/270, RunningAvgSamplesPerSec=6.322369845515796, CurrSamplesPerSec=5.689833538393228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:06:27,100] [INFO] [timer.py:197:stop] 0/272, RunningAvgSamplesPerSec=6.322374139413545, CurrSamplesPerSec=5.697104101334954, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:06:38,457] [INFO] [timer.py:197:stop] 0/274, RunningAvgSamplesPerSec=6.322333531347227, CurrSamplesPerSec=5.704905353797042, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:06:49,837] [INFO] [timer.py:197:stop] 0/276, RunningAvgSamplesPerSec=6.3221861198507225, CurrSamplesPerSec=5.6565828005700505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:01,155] [INFO] [timer.py:197:stop] 0/278, RunningAvgSamplesPerSec=6.32240021581047, CurrSamplesPerSec=5.725513502620633, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:12,519] [INFO] [logging.py:68:log_dist] [Rank 0] step=140, skipped=3, lr=[7.916799978227501e-06], mom=[[0.9, 0.999]] [2022-12-16 20:07:12,521] [INFO] [timer.py:197:stop] 0/280, RunningAvgSamplesPerSec=6.322314987769751, CurrSamplesPerSec=5.682835585653388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:23,835] [INFO] [timer.py:197:stop] 0/282, RunningAvgSamplesPerSec=6.322476369079172, CurrSamplesPerSec=5.728346147854608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:35,189] [INFO] [timer.py:197:stop] 0/284, RunningAvgSamplesPerSec=6.322524555667764, CurrSamplesPerSec=5.702553722581063, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:46,547] [INFO] [timer.py:197:stop] 0/286, RunningAvgSamplesPerSec=6.322601907843428, CurrSamplesPerSec=5.701636331907292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:07:57,869] [INFO] [timer.py:197:stop] 0/288, RunningAvgSamplesPerSec=6.32279292584737, CurrSamplesPerSec=5.722436496272999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:08:09,214] [INFO] [timer.py:197:stop] 0/290, RunningAvgSamplesPerSec=6.3228057686994665, CurrSamplesPerSec=5.686518156577721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:08:20,583] [INFO] [timer.py:197:stop] 0/292, RunningAvgSamplesPerSec=6.3227806640395166, CurrSamplesPerSec=5.679599491324056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:08:31,994] [INFO] [timer.py:197:stop] 0/294, RunningAvgSamplesPerSec=6.322781839290583, CurrSamplesPerSec=5.6987397810401985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:08:43,359] [INFO] [timer.py:197:stop] 0/296, RunningAvgSamplesPerSec=6.3228034210183734, CurrSamplesPerSec=5.697710417265229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:08:54,709] [INFO] [timer.py:197:stop] 0/298, RunningAvgSamplesPerSec=6.322863685882392, CurrSamplesPerSec=5.703479406551039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:09:06,039] [INFO] [logging.py:68:log_dist] [Rank 0] step=150, skipped=3, lr=[8.03016458599496e-06], mom=[[0.9, 0.999]] [2022-12-16 20:09:06,041] [INFO] [timer.py:197:stop] 0/300, RunningAvgSamplesPerSec=6.322928885435723, CurrSamplesPerSec=5.696764601432161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1376, 'learning_rate': 8.03016458599496e-06, 'epoch': 0.64} [2022-12-16 20:09:17,433] [INFO] [timer.py:197:stop] 0/302, RunningAvgSamplesPerSec=6.322817514724475, CurrSamplesPerSec=5.675178281500542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:09:28,800] [INFO] [timer.py:197:stop] 0/304, RunningAvgSamplesPerSec=6.322797946509351, CurrSamplesPerSec=5.68351227186433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:09:40,183] [INFO] [timer.py:197:stop] 0/306, RunningAvgSamplesPerSec=6.322896759479059, CurrSamplesPerSec=5.728840535096935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:09:51,543] [INFO] [timer.py:197:stop] 0/308, RunningAvgSamplesPerSec=6.322910192350933, CurrSamplesPerSec=5.701197726295654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:02,885] [INFO] [timer.py:197:stop] 0/310, RunningAvgSamplesPerSec=6.3230024007557315, CurrSamplesPerSec=5.718039475930382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:14,283] [INFO] [timer.py:197:stop] 0/312, RunningAvgSamplesPerSec=6.322866428944482, CurrSamplesPerSec=5.6628476184268015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:25,595] [INFO] [timer.py:197:stop] 0/314, RunningAvgSamplesPerSec=6.323075360071841, CurrSamplesPerSec=5.708922539388064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:36,913] [INFO] [timer.py:197:stop] 0/316, RunningAvgSamplesPerSec=6.323254930695002, CurrSamplesPerSec=5.7192082831832405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:48,220] [INFO] [timer.py:197:stop] 0/318, RunningAvgSamplesPerSec=6.323405337207393, CurrSamplesPerSec=5.70960253823397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:10:59,560] [INFO] [logging.py:68:log_dist] [Rank 0] step=160, skipped=3, lr=[8.136065420813943e-06], mom=[[0.9, 0.999]] [2022-12-16 20:10:59,562] [INFO] [timer.py:197:stop] 0/320, RunningAvgSamplesPerSec=6.323477777341223, CurrSamplesPerSec=5.689709560887472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:11:10,948] [INFO] [timer.py:197:stop] 0/322, RunningAvgSamplesPerSec=6.323460607589859, CurrSamplesPerSec=5.670344454119559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:11:22,293] [INFO] [timer.py:197:stop] 0/324, RunningAvgSamplesPerSec=6.323463740424428, CurrSamplesPerSec=5.677860204925066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:11:33,643] [INFO] [timer.py:197:stop] 0/326, RunningAvgSamplesPerSec=6.323565869006063, CurrSamplesPerSec=5.704370236980602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:11:44,948] [INFO] [timer.py:197:stop] 0/328, RunningAvgSamplesPerSec=6.323660017579624, CurrSamplesPerSec=5.709596951869308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:11:56,267] [INFO] [timer.py:197:stop] 0/330, RunningAvgSamplesPerSec=6.323823018324491, CurrSamplesPerSec=5.712549046630418, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:12:07,628] [INFO] [timer.py:197:stop] 0/332, RunningAvgSamplesPerSec=6.323832251532743, CurrSamplesPerSec=5.6960204547548505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:12:18,964] [INFO] [timer.py:197:stop] 0/334, RunningAvgSamplesPerSec=6.323933543769019, CurrSamplesPerSec=5.692614320759774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:12:30,318] [INFO] [timer.py:197:stop] 0/336, RunningAvgSamplesPerSec=6.323971450955896, CurrSamplesPerSec=5.704066717376007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:12:41,675] [INFO] [timer.py:197:stop] 0/338, RunningAvgSamplesPerSec=6.3239844985564035, CurrSamplesPerSec=5.698655579391169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:12:53,043] [INFO] [logging.py:68:log_dist] [Rank 0] step=170, skipped=3, lr=[8.235424875329062e-06], mom=[[0.9, 0.999]] [2022-12-16 20:12:53,045] [INFO] [timer.py:197:stop] 0/340, RunningAvgSamplesPerSec=6.323948740973426, CurrSamplesPerSec=5.685194337650517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:13:04,403] [INFO] [timer.py:197:stop] 0/342, RunningAvgSamplesPerSec=6.32396319355309, CurrSamplesPerSec=5.698694534442487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:13:15,751] [INFO] [timer.py:197:stop] 0/344, RunningAvgSamplesPerSec=6.3240232984687195, CurrSamplesPerSec=5.707178839997753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:13:27,102] [INFO] [timer.py:197:stop] 0/346, RunningAvgSamplesPerSec=6.324050524078457, CurrSamplesPerSec=5.720586271562472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:13:38,482] [INFO] [timer.py:197:stop] 0/348, RunningAvgSamplesPerSec=6.323927304234203, CurrSamplesPerSec=5.6728395802686995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:13:49,894] [INFO] [timer.py:197:stop] 0/350, RunningAvgSamplesPerSec=6.323775163215871, CurrSamplesPerSec=5.6732903798269225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1374, 'learning_rate': 8.282894746203441e-06, 'epoch': 0.74} [2022-12-16 20:14:01,273] [INFO] [timer.py:197:stop] 0/352, RunningAvgSamplesPerSec=6.323744099538408, CurrSamplesPerSec=5.701922878806938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:14:12,649] [INFO] [timer.py:197:stop] 0/354, RunningAvgSamplesPerSec=6.32363656035121, CurrSamplesPerSec=5.668106198002945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:14:24,018] [INFO] [timer.py:197:stop] 0/356, RunningAvgSamplesPerSec=6.323612329000168, CurrSamplesPerSec=5.690963815932577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:14:35,387] [INFO] [timer.py:197:stop] 0/358, RunningAvgSamplesPerSec=6.323592321881347, CurrSamplesPerSec=5.682652484549207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:14:46,747] [INFO] [logging.py:68:log_dist] [Rank 0] step=180, skipped=3, lr=[8.329004259959669e-06], mom=[[0.9, 0.999]] [2022-12-16 20:14:46,749] [INFO] [timer.py:197:stop] 0/360, RunningAvgSamplesPerSec=6.323480334411763, CurrSamplesPerSec=5.657225111507851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:14:58,121] [INFO] [timer.py:197:stop] 0/362, RunningAvgSamplesPerSec=6.323431372663673, CurrSamplesPerSec=5.678844444331616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:15:09,479] [INFO] [timer.py:197:stop] 0/364, RunningAvgSamplesPerSec=6.32339248589731, CurrSamplesPerSec=5.671576284583911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:15:20,883] [INFO] [timer.py:197:stop] 0/366, RunningAvgSamplesPerSec=6.323253600927474, CurrSamplesPerSec=5.6644067881016404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:15:32,248] [INFO] [timer.py:197:stop] 0/368, RunningAvgSamplesPerSec=6.323253715509086, CurrSamplesPerSec=5.701989251343157, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:15:43,589] [INFO] [timer.py:197:stop] 0/370, RunningAvgSamplesPerSec=6.323333571683783, CurrSamplesPerSec=5.709123607893714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:15:54,984] [INFO] [timer.py:197:stop] 0/372, RunningAvgSamplesPerSec=6.3231740893087585, CurrSamplesPerSec=5.660488032411503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:16:06,407] [INFO] [timer.py:197:stop] 0/374, RunningAvgSamplesPerSec=6.3231801853876135, CurrSamplesPerSec=5.70347698290224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:16:17,744] [INFO] [timer.py:197:stop] 0/376, RunningAvgSamplesPerSec=6.323287920909756, CurrSamplesPerSec=5.702711939935516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:16:29,091] [INFO] [timer.py:197:stop] 0/378, RunningAvgSamplesPerSec=6.323286122683462, CurrSamplesPerSec=5.687698937227285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:16:40,418] [INFO] [logging.py:68:log_dist] [Rank 0] step=190, skipped=3, lr=[8.417439256037237e-06], mom=[[0.9, 0.999]] [2022-12-16 20:16:40,420] [INFO] [timer.py:197:stop] 0/380, RunningAvgSamplesPerSec=6.323397786842634, CurrSamplesPerSec=5.71710686834569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:16:51,771] [INFO] [timer.py:197:stop] 0/382, RunningAvgSamplesPerSec=6.323379578682893, CurrSamplesPerSec=5.697321750762022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:17:03,330] [INFO] [timer.py:197:stop] 0/384, RunningAvgSamplesPerSec=6.323437318866515, CurrSamplesPerSec=5.70628228013329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:17:14,693] [INFO] [timer.py:197:stop] 0/386, RunningAvgSamplesPerSec=6.323427514274825, CurrSamplesPerSec=5.700057328547446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:17:26,094] [INFO] [timer.py:197:stop] 0/388, RunningAvgSamplesPerSec=6.323306500731329, CurrSamplesPerSec=5.678972033656878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:17:37,448] [INFO] [timer.py:197:stop] 0/390, RunningAvgSamplesPerSec=6.32331374462535, CurrSamplesPerSec=5.687762568718759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:17:48,794] [INFO] [timer.py:197:stop] 0/392, RunningAvgSamplesPerSec=6.323321526858228, CurrSamplesPerSec=5.692411274897352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:18:00,146] [INFO] [timer.py:197:stop] 0/394, RunningAvgSamplesPerSec=6.323302134655659, CurrSamplesPerSec=5.682476612732748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:18:11,512] [INFO] [timer.py:197:stop] 0/396, RunningAvgSamplesPerSec=6.323284170376101, CurrSamplesPerSec=5.6785734261450145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:18:23,016] [INFO] [timer.py:197:stop] 0/398, RunningAvgSamplesPerSec=6.323343518822214, CurrSamplesPerSec=5.709000973847339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:18:34,373] [INFO] [logging.py:68:log_dist] [Rank 0] step=200, skipped=3, lr=[8.501266121799902e-06], mom=[[0.9, 0.999]] [2022-12-16 20:18:34,375] [INFO] [timer.py:197:stop] 0/400, RunningAvgSamplesPerSec=6.32328421011125, CurrSamplesPerSec=5.6872694618968955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1287, 'learning_rate': 8.501266121799902e-06, 'epoch': 0.85} [2022-12-16 20:18:45,718] [INFO] [timer.py:197:stop] 0/402, RunningAvgSamplesPerSec=6.323284665364629, CurrSamplesPerSec=5.690986739759475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:18:57,149] [INFO] [timer.py:197:stop] 0/404, RunningAvgSamplesPerSec=6.323064316517982, CurrSamplesPerSec=5.632093995709347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:19:07,878] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0 [2022-12-16 20:19:07,880] [INFO] [timer.py:197:stop] 0/406, RunningAvgSamplesPerSec=6.324963121796406, CurrSamplesPerSec=6.401309166670601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:19:19,240] [INFO] [timer.py:197:stop] 0/408, RunningAvgSamplesPerSec=6.3249501610981325, CurrSamplesPerSec=5.702367168220342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:19:30,606] [INFO] [timer.py:197:stop] 0/410, RunningAvgSamplesPerSec=6.324919708467864, CurrSamplesPerSec=5.679302927602853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:19:41,978] [INFO] [timer.py:197:stop] 0/412, RunningAvgSamplesPerSec=6.324875286613773, CurrSamplesPerSec=5.687440087427545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:19:53,395] [INFO] [timer.py:197:stop] 0/414, RunningAvgSamplesPerSec=6.32487659033467, CurrSamplesPerSec=5.692081989929908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:20:04,724] [INFO] [timer.py:197:stop] 0/416, RunningAvgSamplesPerSec=6.32491419453475, CurrSamplesPerSec=5.70353587815117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:20:16,092] [INFO] [timer.py:197:stop] 0/418, RunningAvgSamplesPerSec=6.324881861283918, CurrSamplesPerSec=5.668392257188035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:20:27,452] [INFO] [logging.py:68:log_dist] [Rank 0] step=210, skipped=4, lr=[8.573149077803088e-06], mom=[[0.9, 0.999]] [2022-12-16 20:20:27,454] [INFO] [timer.py:197:stop] 0/420, RunningAvgSamplesPerSec=6.324865240196057, CurrSamplesPerSec=5.678834112490291, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:20:38,813] [INFO] [timer.py:197:stop] 0/422, RunningAvgSamplesPerSec=6.324857186084627, CurrSamplesPerSec=5.690756785945908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:20:50,150] [INFO] [timer.py:197:stop] 0/424, RunningAvgSamplesPerSec=6.324868863513124, CurrSamplesPerSec=5.700334274925125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:01,502] [INFO] [timer.py:197:stop] 0/426, RunningAvgSamplesPerSec=6.324832543627125, CurrSamplesPerSec=5.698868991857596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:12,881] [INFO] [timer.py:197:stop] 0/428, RunningAvgSamplesPerSec=6.324672136242403, CurrSamplesPerSec=5.638443932850549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:24,242] [INFO] [timer.py:197:stop] 0/430, RunningAvgSamplesPerSec=6.324660692835897, CurrSamplesPerSec=5.690038088967808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:35,582] [INFO] [timer.py:197:stop] 0/432, RunningAvgSamplesPerSec=6.324708429204134, CurrSamplesPerSec=5.704986345333319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:46,949] [INFO] [timer.py:197:stop] 0/434, RunningAvgSamplesPerSec=6.324708609830103, CurrSamplesPerSec=5.6855702723091985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:21:58,286] [INFO] [timer.py:197:stop] 0/436, RunningAvgSamplesPerSec=6.324717537651673, CurrSamplesPerSec=5.70938394951087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:22:09,632] [INFO] [timer.py:197:stop] 0/438, RunningAvgSamplesPerSec=6.324745496447643, CurrSamplesPerSec=5.714860750176948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:22:20,926] [INFO] [logging.py:68:log_dist] [Rank 0] step=220, skipped=4, lr=[8.64942458567722e-06], mom=[[0.9, 0.999]] [2022-12-16 20:22:20,928] [INFO] [timer.py:197:stop] 0/440, RunningAvgSamplesPerSec=6.324929046177664, CurrSamplesPerSec=5.729245008296917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:22:32,230] [INFO] [timer.py:197:stop] 0/442, RunningAvgSamplesPerSec=6.325051638910014, CurrSamplesPerSec=5.723627116481493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:22:43,588] [INFO] [timer.py:197:stop] 0/444, RunningAvgSamplesPerSec=6.325065678448945, CurrSamplesPerSec=5.693626627517337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:22:54,930] [INFO] [timer.py:197:stop] 0/446, RunningAvgSamplesPerSec=6.325110434850466, CurrSamplesPerSec=5.710919039097566, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:23:06,252] [INFO] [timer.py:197:stop] 0/448, RunningAvgSamplesPerSec=6.325167702164579, CurrSamplesPerSec=5.705739870822447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:23:17,632] [INFO] [timer.py:197:stop] 0/450, RunningAvgSamplesPerSec=6.325104594839814, CurrSamplesPerSec=5.653316282959546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.1225, 'learning_rate': 8.686247975778677e-06, 'epoch': 0.95} [2022-12-16 20:23:29,039] [INFO] [timer.py:197:stop] 0/452, RunningAvgSamplesPerSec=6.3249838784193, CurrSamplesPerSec=5.654148160062199, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:23:40,379] [INFO] [timer.py:197:stop] 0/454, RunningAvgSamplesPerSec=6.325056824708796, CurrSamplesPerSec=5.716592834256081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:23:51,801] [INFO] [timer.py:197:stop] 0/456, RunningAvgSamplesPerSec=6.324918730493177, CurrSamplesPerSec=5.666096698292425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:03,158] [INFO] [timer.py:197:stop] 0/458, RunningAvgSamplesPerSec=6.3249018266408745, CurrSamplesPerSec=5.6668892730962055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:14,505] [INFO] [logging.py:68:log_dist] [Rank 0] step=230, skipped=4, lr=[8.722247506883805e-06], mom=[[0.9, 0.999]] [2022-12-16 20:24:14,507] [INFO] [timer.py:197:stop] 0/460, RunningAvgSamplesPerSec=6.324939407967614, CurrSamplesPerSec=5.702041090822069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:25,830] [INFO] [timer.py:197:stop] 0/462, RunningAvgSamplesPerSec=6.3249985635148835, CurrSamplesPerSec=5.695622592543446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:37,171] [INFO] [timer.py:197:stop] 0/464, RunningAvgSamplesPerSec=6.325064934601947, CurrSamplesPerSec=5.706983489691964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:47,866] [INFO] [stage_1_and_2.py:1765:step] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 [2022-12-16 20:24:47,868] [INFO] [timer.py:197:stop] 0/466, RunningAvgSamplesPerSec=6.326814356293397, CurrSamplesPerSec=6.417096329026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:24:59,202] [INFO] [timer.py:197:stop] 0/468, RunningAvgSamplesPerSec=6.326888453208697, CurrSamplesPerSec=5.710727563347197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:25:10,596] [INFO] [timer.py:197:stop] 0/470, RunningAvgSamplesPerSec=6.32680186680534, CurrSamplesPerSec=5.657430662739473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:25:19,121] [INFO] [timer.py:197:stop] 0/472, RunningAvgSamplesPerSec=6.333450758411454, CurrSamplesPerSec=10.137390170749411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:25:30,504] [INFO] [timer.py:197:stop] 0/474, RunningAvgSamplesPerSec=6.333370618689054, CurrSamplesPerSec=5.659949520087447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:25:41,898] [INFO] [timer.py:197:stop] 0/476, RunningAvgSamplesPerSec=6.333330907948472, CurrSamplesPerSec=5.689830402708782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:25:53,272] [INFO] [timer.py:197:stop] 0/478, RunningAvgSamplesPerSec=6.333309972082303, CurrSamplesPerSec=5.679410830915424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:26:04,786] [INFO] [logging.py:68:log_dist] [Rank 0] step=240, skipped=5, lr=[8.785084156039184e-06], mom=[[0.9, 0.999]] [2022-12-16 20:26:04,788] [INFO] [timer.py:197:stop] 0/480, RunningAvgSamplesPerSec=6.333324209339847, CurrSamplesPerSec=5.711554548737424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:26:16,153] [INFO] [timer.py:197:stop] 0/482, RunningAvgSamplesPerSec=6.333275648262772, CurrSamplesPerSec=5.67584738469762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:26:27,536] [INFO] [timer.py:197:stop] 0/484, RunningAvgSamplesPerSec=6.333180443017503, CurrSamplesPerSec=5.658684801072698, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:26:38,936] [INFO] [timer.py:197:stop] 0/486, RunningAvgSamplesPerSec=6.333063356107658, CurrSamplesPerSec=5.66522088952818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:26:50,275] [INFO] [timer.py:197:stop] 0/488, RunningAvgSamplesPerSec=6.333064958163027, CurrSamplesPerSec=5.716922526210353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:01,596] [INFO] [timer.py:197:stop] 0/490, RunningAvgSamplesPerSec=6.333097497150548, CurrSamplesPerSec=5.708483783263635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:12,974] [INFO] [timer.py:197:stop] 0/492, RunningAvgSamplesPerSec=6.333030716341655, CurrSamplesPerSec=5.669087297819405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:24,283] [INFO] [timer.py:197:stop] 0/494, RunningAvgSamplesPerSec=6.33308950716907, CurrSamplesPerSec=5.72159806401921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:35,659] [INFO] [timer.py:197:stop] 0/496, RunningAvgSamplesPerSec=6.333030116862234, CurrSamplesPerSec=5.685264896868218, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:47,023] [INFO] [timer.py:197:stop] 0/498, RunningAvgSamplesPerSec=6.332996476016625, CurrSamplesPerSec=5.69285118566385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:27:58,379] [INFO] [logging.py:68:log_dist] [Rank 0] step=250, skipped=5, lr=[8.852140188761744e-06], mom=[[0.9, 0.999]] [2022-12-16 20:27:58,380] [INFO] [timer.py:197:stop] 0/500, RunningAvgSamplesPerSec=6.332979581463783, CurrSamplesPerSec=5.68674728607022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0911, 'learning_rate': 8.852140188761744e-06, 'epoch': 1.06} [2022-12-16 20:28:09,749] [INFO] [timer.py:197:stop] 0/502, RunningAvgSamplesPerSec=6.332968543294653, CurrSamplesPerSec=5.698553960024957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:28:21,117] [INFO] [timer.py:197:stop] 0/504, RunningAvgSamplesPerSec=6.3329375267007215, CurrSamplesPerSec=5.680363153261814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:28:32,487] [INFO] [timer.py:197:stop] 0/506, RunningAvgSamplesPerSec=6.332904182885525, CurrSamplesPerSec=5.687939491029711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:28:43,855] [INFO] [timer.py:197:stop] 0/508, RunningAvgSamplesPerSec=6.33285409700931, CurrSamplesPerSec=5.677202633924842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:28:55,191] [INFO] [timer.py:197:stop] 0/510, RunningAvgSamplesPerSec=6.332775886204445, CurrSamplesPerSec=5.648280477264634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:29:06,683] [INFO] [timer.py:197:stop] 0/512, RunningAvgSamplesPerSec=6.332755158221311, CurrSamplesPerSec=5.688899257349144, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:29:18,066] [INFO] [timer.py:197:stop] 0/514, RunningAvgSamplesPerSec=6.332672689917679, CurrSamplesPerSec=5.677710568918598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:29:29,434] [INFO] [timer.py:197:stop] 0/516, RunningAvgSamplesPerSec=6.3326342128881095, CurrSamplesPerSec=5.682176861785842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:29:40,802] [INFO] [timer.py:197:stop] 0/518, RunningAvgSamplesPerSec=6.332589814616271, CurrSamplesPerSec=5.674145177376899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:29:52,188] [INFO] [logging.py:68:log_dist] [Rank 0] step=260, skipped=5, lr=[8.916513249749862e-06], mom=[[0.9, 0.999]] [2022-12-16 20:29:52,189] [INFO] [timer.py:197:stop] 0/520, RunningAvgSamplesPerSec=6.332535441063731, CurrSamplesPerSec=5.690993254986084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:30:03,651] [INFO] [timer.py:197:stop] 0/522, RunningAvgSamplesPerSec=6.332321192236601, CurrSamplesPerSec=5.627202311158195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:30:15,079] [INFO] [timer.py:197:stop] 0/524, RunningAvgSamplesPerSec=6.332264757881449, CurrSamplesPerSec=5.681202047908641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:30:26,506] [INFO] [timer.py:197:stop] 0/526, RunningAvgSamplesPerSec=6.332111989659685, CurrSamplesPerSec=5.663871593105844, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:30:37,853] [INFO] [timer.py:197:stop] 0/528, RunningAvgSamplesPerSec=6.332086845890149, CurrSamplesPerSec=5.697914808915609, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:30:49,299] [INFO] [timer.py:197:stop] 0/530, RunningAvgSamplesPerSec=6.332118305953814, CurrSamplesPerSec=5.707362068898067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:00,725] [INFO] [timer.py:197:stop] 0/532, RunningAvgSamplesPerSec=6.332145135171992, CurrSamplesPerSec=5.7186139518732695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:12,098] [INFO] [timer.py:197:stop] 0/534, RunningAvgSamplesPerSec=6.332096332695988, CurrSamplesPerSec=5.686162813500998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:23,474] [INFO] [timer.py:197:stop] 0/536, RunningAvgSamplesPerSec=6.332142347447645, CurrSamplesPerSec=5.717022610095439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:34,884] [INFO] [timer.py:197:stop] 0/538, RunningAvgSamplesPerSec=6.3320406629010915, CurrSamplesPerSec=5.6585333113317695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:46,283] [INFO] [logging.py:68:log_dist] [Rank 0] step=270, skipped=5, lr=[8.978409800937961e-06], mom=[[0.9, 0.999]] [2022-12-16 20:31:46,285] [INFO] [timer.py:197:stop] 0/540, RunningAvgSamplesPerSec=6.331949549833869, CurrSamplesPerSec=5.658519236318848, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:31:57,631] [INFO] [timer.py:197:stop] 0/542, RunningAvgSamplesPerSec=6.331908691993134, CurrSamplesPerSec=5.674752614311713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:32:09,011] [INFO] [timer.py:197:stop] 0/544, RunningAvgSamplesPerSec=6.331837872115681, CurrSamplesPerSec=5.664258577485706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:32:20,410] [INFO] [timer.py:197:stop] 0/546, RunningAvgSamplesPerSec=6.331805803303936, CurrSamplesPerSec=5.676086937031499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:32:31,832] [INFO] [timer.py:197:stop] 0/548, RunningAvgSamplesPerSec=6.331781192265678, CurrSamplesPerSec=5.680064586753917, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:32:43,212] [INFO] [timer.py:197:stop] 0/550, RunningAvgSamplesPerSec=6.3318012701947355, CurrSamplesPerSec=5.7015488960272975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0701, 'learning_rate': 9.00848753507038e-06, 'epoch': 1.17} [2022-12-16 20:32:54,503] [INFO] [timer.py:197:stop] 0/552, RunningAvgSamplesPerSec=6.331942303093126, CurrSamplesPerSec=5.7685950512679565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:33:05,886] [INFO] [timer.py:197:stop] 0/554, RunningAvgSamplesPerSec=6.331877715290052, CurrSamplesPerSec=5.652510360163067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:33:17,264] [INFO] [timer.py:197:stop] 0/556, RunningAvgSamplesPerSec=6.331889387981903, CurrSamplesPerSec=5.6983301682483605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:33:28,711] [INFO] [timer.py:197:stop] 0/558, RunningAvgSamplesPerSec=6.331829135305767, CurrSamplesPerSec=5.683874023366699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:33:40,068] [INFO] [logging.py:68:log_dist] [Rank 0] step=280, skipped=5, lr=[9.038013352913754e-06], mom=[[0.9, 0.999]] [2022-12-16 20:33:40,070] [INFO] [timer.py:197:stop] 0/560, RunningAvgSamplesPerSec=6.331762277422967, CurrSamplesPerSec=5.6758255427552475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:33:51,539] [INFO] [timer.py:197:stop] 0/562, RunningAvgSamplesPerSec=6.33149639148276, CurrSamplesPerSec=5.578608736601625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:34:02,921] [INFO] [timer.py:197:stop] 0/564, RunningAvgSamplesPerSec=6.331446008676222, CurrSamplesPerSec=5.665100373416714, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:34:14,469] [INFO] [timer.py:197:stop] 0/566, RunningAvgSamplesPerSec=6.331452750529917, CurrSamplesPerSec=5.698296540516734, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:34:25,876] [INFO] [timer.py:197:stop] 0/568, RunningAvgSamplesPerSec=6.331366092485218, CurrSamplesPerSec=5.647619758537656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:34:37,224] [INFO] [timer.py:197:stop] 0/570, RunningAvgSamplesPerSec=6.331366844477702, CurrSamplesPerSec=5.709335862237885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:34:48,577] [INFO] [timer.py:197:stop] 0/572, RunningAvgSamplesPerSec=6.331423638643338, CurrSamplesPerSec=5.7014241652722255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:00,234] [INFO] [timer.py:197:stop] 0/574, RunningAvgSamplesPerSec=6.331173142949745, CurrSamplesPerSec=5.674498540151139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:11,592] [INFO] [timer.py:197:stop] 0/576, RunningAvgSamplesPerSec=6.331154306508723, CurrSamplesPerSec=5.684172749770408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:23,056] [INFO] [timer.py:197:stop] 0/578, RunningAvgSamplesPerSec=6.331166114638882, CurrSamplesPerSec=5.694031457786667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:34,645] [INFO] [logging.py:68:log_dist] [Rank 0] step=290, skipped=5, lr=[9.095487745564754e-06], mom=[[0.9, 0.999]] [2022-12-16 20:35:34,647] [INFO] [timer.py:197:stop] 0/580, RunningAvgSamplesPerSec=6.33109363132063, CurrSamplesPerSec=5.688584603467747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:45,976] [INFO] [timer.py:197:stop] 0/582, RunningAvgSamplesPerSec=6.331107874787885, CurrSamplesPerSec=5.708166230976763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:35:57,335] [INFO] [timer.py:197:stop] 0/584, RunningAvgSamplesPerSec=6.331087853249488, CurrSamplesPerSec=5.686966795466654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:36:08,878] [INFO] [timer.py:197:stop] 0/586, RunningAvgSamplesPerSec=6.331042606888964, CurrSamplesPerSec=5.681727775695035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:36:20,260] [INFO] [timer.py:197:stop] 0/588, RunningAvgSamplesPerSec=6.330978840033063, CurrSamplesPerSec=5.674192433800074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:36:31,673] [INFO] [timer.py:197:stop] 0/590, RunningAvgSamplesPerSec=6.331058742308961, CurrSamplesPerSec=5.72288667256049, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:36:43,263] [INFO] [timer.py:197:stop] 0/592, RunningAvgSamplesPerSec=6.331048536749523, CurrSamplesPerSec=5.697260565409411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:36:54,663] [INFO] [timer.py:197:stop] 0/594, RunningAvgSamplesPerSec=6.330940201893569, CurrSamplesPerSec=5.636410670239449, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:37:06,011] [INFO] [timer.py:197:stop] 0/596, RunningAvgSamplesPerSec=6.330946589807912, CurrSamplesPerSec=5.697826519598953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:37:17,405] [INFO] [timer.py:197:stop] 0/598, RunningAvgSamplesPerSec=6.330878178920464, CurrSamplesPerSec=5.708957749637251, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:37:28,767] [INFO] [logging.py:68:log_dist] [Rank 0] step=300, skipped=5, lr=[9.150979862726452e-06], mom=[[0.9, 0.999]] [2022-12-16 20:37:28,769] [INFO] [timer.py:197:stop] 0/600, RunningAvgSamplesPerSec=6.330782670927434, CurrSamplesPerSec=5.643513461533772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.068, 'learning_rate': 9.150979862726452e-06, 'epoch': 1.27} [2022-12-16 20:37:40,125] [INFO] [timer.py:197:stop] 0/602, RunningAvgSamplesPerSec=6.330782383440869, CurrSamplesPerSec=5.690124207229166, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:37:51,504] [INFO] [timer.py:197:stop] 0/604, RunningAvgSamplesPerSec=6.3306832707920435, CurrSamplesPerSec=5.6678194496573235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:38:02,851] [INFO] [timer.py:197:stop] 0/606, RunningAvgSamplesPerSec=6.330655808571751, CurrSamplesPerSec=5.682673657266739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:38:14,168] [INFO] [timer.py:197:stop] 0/608, RunningAvgSamplesPerSec=6.330691143435459, CurrSamplesPerSec=5.6872265661282, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:38:25,693] [INFO] [timer.py:197:stop] 0/610, RunningAvgSamplesPerSec=6.330329932714655, CurrSamplesPerSec=5.668653446060722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:38:37,073] [INFO] [timer.py:197:stop] 0/612, RunningAvgSamplesPerSec=6.330272091806936, CurrSamplesPerSec=5.659673142650592, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:38:48,518] [INFO] [timer.py:197:stop] 0/614, RunningAvgSamplesPerSec=6.330101699475937, CurrSamplesPerSec=5.62189192221988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:00,031] [INFO] [timer.py:197:stop] 0/616, RunningAvgSamplesPerSec=6.330042213874827, CurrSamplesPerSec=5.6853116162218384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:11,430] [INFO] [timer.py:197:stop] 0/618, RunningAvgSamplesPerSec=6.329934203787018, CurrSamplesPerSec=5.662486866094601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:22,683] [INFO] [logging.py:68:log_dist] [Rank 0] step=310, skipped=5, lr=[9.204621894113846e-06], mom=[[0.9, 0.999]] [2022-12-16 20:39:22,684] [INFO] [timer.py:197:stop] 0/620, RunningAvgSamplesPerSec=6.329990960271664, CurrSamplesPerSec=5.721851007076893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:34,075] [INFO] [timer.py:197:stop] 0/622, RunningAvgSamplesPerSec=6.330011262521667, CurrSamplesPerSec=5.712285012692834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:45,443] [INFO] [timer.py:197:stop] 0/624, RunningAvgSamplesPerSec=6.329971505255353, CurrSamplesPerSec=5.673877006086818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:39:56,776] [INFO] [timer.py:197:stop] 0/626, RunningAvgSamplesPerSec=6.330001807060071, CurrSamplesPerSec=5.713938909287577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:40:08,120] [INFO] [timer.py:197:stop] 0/628, RunningAvgSamplesPerSec=6.330083907626634, CurrSamplesPerSec=5.71931502735193, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:40:19,422] [INFO] [timer.py:197:stop] 0/630, RunningAvgSamplesPerSec=6.330141783936141, CurrSamplesPerSec=5.7165628862802125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:40:30,734] [INFO] [timer.py:197:stop] 0/632, RunningAvgSamplesPerSec=6.33018163456153, CurrSamplesPerSec=5.710113372472082, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:40:42,220] [INFO] [timer.py:197:stop] 0/634, RunningAvgSamplesPerSec=6.330237538427551, CurrSamplesPerSec=5.711593437284221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:40:53,732] [INFO] [timer.py:197:stop] 0/636, RunningAvgSamplesPerSec=6.330298913515279, CurrSamplesPerSec=5.720521903529705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:41:05,190] [INFO] [timer.py:197:stop] 0/638, RunningAvgSamplesPerSec=6.33008533267941, CurrSamplesPerSec=5.573157273614473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:41:16,628] [INFO] [logging.py:68:log_dist] [Rank 0] step=320, skipped=5, lr=[9.256533232218034e-06], mom=[[0.9, 0.999]] [2022-12-16 20:41:16,630] [INFO] [timer.py:197:stop] 0/640, RunningAvgSamplesPerSec=6.330096923728828, CurrSamplesPerSec=5.724061856567901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:41:28,236] [INFO] [timer.py:197:stop] 0/642, RunningAvgSamplesPerSec=6.330130764892317, CurrSamplesPerSec=5.707274214632692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:41:39,681] [INFO] [timer.py:197:stop] 0/644, RunningAvgSamplesPerSec=6.329947385873928, CurrSamplesPerSec=5.589888718849701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:41:51,204] [INFO] [timer.py:197:stop] 0/646, RunningAvgSamplesPerSec=6.33000563881282, CurrSamplesPerSec=5.708520444862126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:42:02,680] [INFO] [timer.py:197:stop] 0/648, RunningAvgSamplesPerSec=6.330004564500237, CurrSamplesPerSec=5.6796963498806345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:42:14,226] [INFO] [timer.py:197:stop] 0/650, RunningAvgSamplesPerSec=6.329629407966906, CurrSamplesPerSec=5.467900776983032, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0673, 'learning_rate': 9.281874101213678e-06, 'epoch': 1.38} [2022-12-16 20:42:25,619] [INFO] [timer.py:197:stop] 0/652, RunningAvgSamplesPerSec=6.3297077715609475, CurrSamplesPerSec=5.719371812973972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:42:36,953] [INFO] [timer.py:197:stop] 0/654, RunningAvgSamplesPerSec=6.3297367998254535, CurrSamplesPerSec=5.705432082423167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:42:48,561] [INFO] [timer.py:197:stop] 0/656, RunningAvgSamplesPerSec=6.329244035664368, CurrSamplesPerSec=5.4295023895352434, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:42:59,916] [INFO] [timer.py:197:stop] 0/658, RunningAvgSamplesPerSec=6.329237844863367, CurrSamplesPerSec=5.688861882844252, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:43:11,315] [INFO] [logging.py:68:log_dist] [Rank 0] step=330, skipped=5, lr=[9.306822072655195e-06], mom=[[0.9, 0.999]] [2022-12-16 20:43:11,317] [INFO] [timer.py:197:stop] 0/660, RunningAvgSamplesPerSec=6.329147547268884, CurrSamplesPerSec=5.661335155406955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:43:22,771] [INFO] [timer.py:197:stop] 0/662, RunningAvgSamplesPerSec=6.328950035472537, CurrSamplesPerSec=5.603190307343656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:43:34,098] [INFO] [timer.py:197:stop] 0/664, RunningAvgSamplesPerSec=6.328962025890756, CurrSamplesPerSec=5.695756979637728, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:43:45,431] [INFO] [timer.py:197:stop] 0/666, RunningAvgSamplesPerSec=6.3289940857266, CurrSamplesPerSec=5.71611467805298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:43:57,131] [INFO] [timer.py:197:stop] 0/668, RunningAvgSamplesPerSec=6.328336634156604, CurrSamplesPerSec=5.346707418970935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:44:08,494] [INFO] [timer.py:197:stop] 0/670, RunningAvgSamplesPerSec=6.328314264906798, CurrSamplesPerSec=5.697728074190073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:44:19,922] [INFO] [timer.py:197:stop] 0/672, RunningAvgSamplesPerSec=6.328180041670801, CurrSamplesPerSec=5.648812729681554, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:44:31,398] [INFO] [timer.py:197:stop] 0/674, RunningAvgSamplesPerSec=6.3281297840632105, CurrSamplesPerSec=5.669640962282183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:44:42,725] [INFO] [timer.py:197:stop] 0/676, RunningAvgSamplesPerSec=6.328147856603661, CurrSamplesPerSec=5.706415715042744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:44:54,300] [INFO] [timer.py:197:stop] 0/678, RunningAvgSamplesPerSec=6.328126766932452, CurrSamplesPerSec=5.706175536698659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:45:05,797] [INFO] [logging.py:68:log_dist] [Rank 0] step=340, skipped=5, lr=[9.355586771917604e-06], mom=[[0.9, 0.999]] [2022-12-16 20:45:05,799] [INFO] [timer.py:197:stop] 0/680, RunningAvgSamplesPerSec=6.328068517423547, CurrSamplesPerSec=5.6591135490713995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:45:17,174] [INFO] [timer.py:197:stop] 0/682, RunningAvgSamplesPerSec=6.328021957238399, CurrSamplesPerSec=5.682638529889822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:45:28,596] [INFO] [timer.py:197:stop] 0/684, RunningAvgSamplesPerSec=6.327975433273524, CurrSamplesPerSec=5.680331179663342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:45:40,277] [INFO] [timer.py:197:stop] 0/686, RunningAvgSamplesPerSec=6.327854974024864, CurrSamplesPerSec=5.683192919422884, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:45:51,668] [INFO] [timer.py:197:stop] 0/688, RunningAvgSamplesPerSec=6.3277861583761235, CurrSamplesPerSec=5.6660746921346865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:46:03,229] [INFO] [timer.py:197:stop] 0/690, RunningAvgSamplesPerSec=6.327768422044784, CurrSamplesPerSec=5.680019395846795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:46:14,838] [INFO] [timer.py:197:stop] 0/692, RunningAvgSamplesPerSec=6.327696014574452, CurrSamplesPerSec=5.657069646483641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:46:26,177] [INFO] [timer.py:197:stop] 0/694, RunningAvgSamplesPerSec=6.327699873718016, CurrSamplesPerSec=5.690042913451338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:46:37,632] [INFO] [timer.py:197:stop] 0/696, RunningAvgSamplesPerSec=6.3276904427181755, CurrSamplesPerSec=5.683944790522634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:46:49,248] [INFO] [timer.py:197:stop] 0/698, RunningAvgSamplesPerSec=6.327720553451032, CurrSamplesPerSec=5.706627525449706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:47:00,579] [INFO] [logging.py:68:log_dist] [Rank 0] step=350, skipped=5, lr=[9.402917005361869e-06], mom=[[0.9, 0.999]] [2022-12-16 20:47:00,581] [INFO] [timer.py:197:stop] 0/700, RunningAvgSamplesPerSec=6.327703647205433, CurrSamplesPerSec=5.679101551095431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0679, 'learning_rate': 9.402917005361869e-06, 'epoch': 1.48} [2022-12-16 20:47:11,950] [INFO] [timer.py:197:stop] 0/702, RunningAvgSamplesPerSec=6.327654490303129, CurrSamplesPerSec=5.675178761431945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:47:23,335] [INFO] [timer.py:197:stop] 0/704, RunningAvgSamplesPerSec=6.327571946536897, CurrSamplesPerSec=5.68916306298161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:47:34,792] [INFO] [timer.py:197:stop] 0/706, RunningAvgSamplesPerSec=6.3273886387415645, CurrSamplesPerSec=5.601829945397302, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:47:46,190] [INFO] [timer.py:197:stop] 0/708, RunningAvgSamplesPerSec=6.327378689010138, CurrSamplesPerSec=5.6861664269358885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:47:57,593] [INFO] [timer.py:197:stop] 0/710, RunningAvgSamplesPerSec=6.327316883649861, CurrSamplesPerSec=5.729989789413974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:48:08,972] [INFO] [timer.py:197:stop] 0/712, RunningAvgSamplesPerSec=6.327270752057368, CurrSamplesPerSec=5.654470688433138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:48:20,337] [INFO] [timer.py:197:stop] 0/714, RunningAvgSamplesPerSec=6.3272791064630285, CurrSamplesPerSec=5.7056947555522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:48:31,825] [INFO] [timer.py:197:stop] 0/716, RunningAvgSamplesPerSec=6.327258717853537, CurrSamplesPerSec=5.680939701404044, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:48:43,188] [INFO] [timer.py:197:stop] 0/718, RunningAvgSamplesPerSec=6.327295608249954, CurrSamplesPerSec=5.7009349827558875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:48:54,690] [INFO] [logging.py:68:log_dist] [Rank 0] step=360, skipped=5, lr=[9.44889475969735e-06], mom=[[0.9, 0.999]] [2022-12-16 20:48:54,691] [INFO] [timer.py:197:stop] 0/720, RunningAvgSamplesPerSec=6.327035979572224, CurrSamplesPerSec=5.538819333340514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:49:06,214] [INFO] [timer.py:197:stop] 0/722, RunningAvgSamplesPerSec=6.3271111128208615, CurrSamplesPerSec=5.722535309390941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:49:17,701] [INFO] [timer.py:197:stop] 0/724, RunningAvgSamplesPerSec=6.327128664425442, CurrSamplesPerSec=5.708196333806104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:49:29,078] [INFO] [timer.py:197:stop] 0/726, RunningAvgSamplesPerSec=6.327087552167851, CurrSamplesPerSec=5.676013965040711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:49:40,463] [INFO] [timer.py:197:stop] 0/728, RunningAvgSamplesPerSec=6.327109934448353, CurrSamplesPerSec=5.6931926341853965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:49:51,845] [INFO] [timer.py:197:stop] 0/730, RunningAvgSamplesPerSec=6.327038089475233, CurrSamplesPerSec=5.669245100330484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:50:03,266] [INFO] [timer.py:197:stop] 0/732, RunningAvgSamplesPerSec=6.326924670910652, CurrSamplesPerSec=5.621069038048019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:50:14,582] [INFO] [timer.py:197:stop] 0/734, RunningAvgSamplesPerSec=6.326959216687658, CurrSamplesPerSec=5.701227997844524, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:50:25,927] [INFO] [timer.py:197:stop] 0/736, RunningAvgSamplesPerSec=6.326940875652652, CurrSamplesPerSec=5.674422250368862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:50:37,316] [INFO] [timer.py:197:stop] 0/738, RunningAvgSamplesPerSec=6.326823613295239, CurrSamplesPerSec=5.6246842242873045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:50:48,661] [INFO] [logging.py:68:log_dist] [Rank 0] step=370, skipped=5, lr=[9.493595187571683e-06], mom=[[0.9, 0.999]] [2022-12-16 20:50:48,663] [INFO] [timer.py:197:stop] 0/740, RunningAvgSamplesPerSec=6.3268314840496505, CurrSamplesPerSec=5.707714969358892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:51:00,070] [INFO] [timer.py:197:stop] 0/742, RunningAvgSamplesPerSec=6.326866428321336, CurrSamplesPerSec=5.711735384974162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:51:11,574] [INFO] [timer.py:197:stop] 0/744, RunningAvgSamplesPerSec=6.326713261271251, CurrSamplesPerSec=5.631804735285801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:51:22,930] [INFO] [timer.py:197:stop] 0/746, RunningAvgSamplesPerSec=6.326691157515512, CurrSamplesPerSec=5.673347214431089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:51:34,299] [INFO] [timer.py:197:stop] 0/748, RunningAvgSamplesPerSec=6.326638394503159, CurrSamplesPerSec=5.668620646503149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:51:45,734] [INFO] [timer.py:197:stop] 0/750, RunningAvgSamplesPerSec=6.326511199481409, CurrSamplesPerSec=5.6745774710472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0671, 'learning_rate': 9.51548820454122e-06, 'epoch': 1.59} [2022-12-16 20:51:57,109] [INFO] [timer.py:197:stop] 0/752, RunningAvgSamplesPerSec=6.326485323384035, CurrSamplesPerSec=5.6853226941193356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:52:08,661] [INFO] [timer.py:197:stop] 0/754, RunningAvgSamplesPerSec=6.326162477755519, CurrSamplesPerSec=5.498609592683866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:52:19,985] [INFO] [timer.py:197:stop] 0/756, RunningAvgSamplesPerSec=6.326188890677415, CurrSamplesPerSec=5.701028695875893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:52:31,324] [INFO] [timer.py:197:stop] 0/758, RunningAvgSamplesPerSec=6.326163715081771, CurrSamplesPerSec=5.669143569330741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:52:42,739] [INFO] [logging.py:68:log_dist] [Rank 0] step=380, skipped=5, lr=[9.53708734662638e-06], mom=[[0.9, 0.999]] [2022-12-16 20:52:42,741] [INFO] [timer.py:197:stop] 0/760, RunningAvgSamplesPerSec=6.326036630362106, CurrSamplesPerSec=5.680581689222482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:52:54,099] [INFO] [timer.py:197:stop] 0/762, RunningAvgSamplesPerSec=6.326032010296816, CurrSamplesPerSec=5.697016078956156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:53:05,616] [INFO] [timer.py:197:stop] 0/764, RunningAvgSamplesPerSec=6.325775425618378, CurrSamplesPerSec=5.5389985402003274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:53:17,029] [INFO] [timer.py:197:stop] 0/766, RunningAvgSamplesPerSec=6.325833860421, CurrSamplesPerSec=5.722426005204077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:53:28,386] [INFO] [timer.py:197:stop] 0/768, RunningAvgSamplesPerSec=6.325830415376826, CurrSamplesPerSec=5.686190275721349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:53:39,816] [INFO] [timer.py:197:stop] 0/770, RunningAvgSamplesPerSec=6.325709680080751, CurrSamplesPerSec=5.627372182760118, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:53:51,153] [INFO] [timer.py:197:stop] 0/772, RunningAvgSamplesPerSec=6.32574588936009, CurrSamplesPerSec=5.720313205022317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:02,473] [INFO] [timer.py:197:stop] 0/774, RunningAvgSamplesPerSec=6.325779808632974, CurrSamplesPerSec=5.700795024194597, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:14,088] [INFO] [timer.py:197:stop] 0/776, RunningAvgSamplesPerSec=6.325366824020554, CurrSamplesPerSec=5.711546041938402, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:25,416] [INFO] [timer.py:197:stop] 0/778, RunningAvgSamplesPerSec=6.325388120195118, CurrSamplesPerSec=5.688599551753485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:36,837] [INFO] [logging.py:68:log_dist] [Rank 0] step=390, skipped=5, lr=[9.57943484127219e-06], mom=[[0.9, 0.999]] [2022-12-16 20:54:36,839] [INFO] [timer.py:197:stop] 0/780, RunningAvgSamplesPerSec=6.325287416370516, CurrSamplesPerSec=5.61396389033286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:48,184] [INFO] [timer.py:197:stop] 0/782, RunningAvgSamplesPerSec=6.325311851558554, CurrSamplesPerSec=5.685149787510647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:54:59,503] [INFO] [timer.py:197:stop] 0/784, RunningAvgSamplesPerSec=6.325374166849946, CurrSamplesPerSec=5.711454413166849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:55:10,900] [INFO] [timer.py:197:stop] 0/786, RunningAvgSamplesPerSec=6.3253531728228385, CurrSamplesPerSec=5.716634956736923, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:55:22,255] [INFO] [timer.py:197:stop] 0/788, RunningAvgSamplesPerSec=6.3253672114173085, CurrSamplesPerSec=5.696488243568338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:55:33,802] [INFO] [timer.py:197:stop] 0/790, RunningAvgSamplesPerSec=6.325068850147272, CurrSamplesPerSec=5.498009999649353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:55:45,139] [INFO] [timer.py:197:stop] 0/792, RunningAvgSamplesPerSec=6.325112867831402, CurrSamplesPerSec=5.697386081443267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:55:56,474] [INFO] [timer.py:197:stop] 0/794, RunningAvgSamplesPerSec=6.325150134664613, CurrSamplesPerSec=5.681703002256421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:56:07,910] [INFO] [timer.py:197:stop] 0/796, RunningAvgSamplesPerSec=6.325005608906404, CurrSamplesPerSec=5.594190346236312, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:56:19,261] [INFO] [timer.py:197:stop] 0/798, RunningAvgSamplesPerSec=6.3250222821537205, CurrSamplesPerSec=5.69781031336407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:56:30,630] [INFO] [logging.py:68:log_dist] [Rank 0] step=400, skipped=5, lr=[9.620696382156558e-06], mom=[[0.9, 0.999]] [2022-12-16 20:56:30,632] [INFO] [timer.py:197:stop] 0/800, RunningAvgSamplesPerSec=6.324955084688922, CurrSamplesPerSec=5.645958662739079, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0739, 'learning_rate': 9.620696382156558e-06, 'epoch': 1.69} [2022-12-16 20:56:42,048] [INFO] [timer.py:197:stop] 0/802, RunningAvgSamplesPerSec=6.324866416048774, CurrSamplesPerSec=5.690673302590797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:56:53,416] [INFO] [timer.py:197:stop] 0/804, RunningAvgSamplesPerSec=6.324849206965416, CurrSamplesPerSec=5.679540608704553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:57:04,806] [INFO] [timer.py:197:stop] 0/806, RunningAvgSamplesPerSec=6.324771230961369, CurrSamplesPerSec=5.641830353065095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:57:16,167] [INFO] [timer.py:197:stop] 0/808, RunningAvgSamplesPerSec=6.324769497199892, CurrSamplesPerSec=5.689590894872664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:57:27,551] [INFO] [timer.py:197:stop] 0/810, RunningAvgSamplesPerSec=6.324736821415485, CurrSamplesPerSec=5.685807755696759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:57:39,228] [INFO] [timer.py:197:stop] 0/812, RunningAvgSamplesPerSec=6.324646795857594, CurrSamplesPerSec=5.683070915568724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:57:50,570] [INFO] [timer.py:197:stop] 0/814, RunningAvgSamplesPerSec=6.324672285175087, CurrSamplesPerSec=5.70057977262278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:01,922] [INFO] [timer.py:197:stop] 0/816, RunningAvgSamplesPerSec=6.324660014668366, CurrSamplesPerSec=5.694164803509382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:13,588] [INFO] [timer.py:197:stop] 0/818, RunningAvgSamplesPerSec=6.324670226228126, CurrSamplesPerSec=5.698775107760346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:24,936] [INFO] [logging.py:68:log_dist] [Rank 0] step=410, skipped=5, lr=[9.660926275674324e-06], mom=[[0.9, 0.999]] [2022-12-16 20:58:24,938] [INFO] [timer.py:197:stop] 0/820, RunningAvgSamplesPerSec=6.3246971948466655, CurrSamplesPerSec=5.701889208658522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:36,585] [INFO] [timer.py:197:stop] 0/822, RunningAvgSamplesPerSec=6.324232119390502, CurrSamplesPerSec=5.404802825653943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:47,931] [INFO] [timer.py:197:stop] 0/824, RunningAvgSamplesPerSec=6.324252861625952, CurrSamplesPerSec=5.691459011684438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:58:59,279] [INFO] [timer.py:197:stop] 0/826, RunningAvgSamplesPerSec=6.324276862279914, CurrSamplesPerSec=5.691145039857559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:59:10,688] [INFO] [timer.py:197:stop] 0/828, RunningAvgSamplesPerSec=6.324198814394731, CurrSamplesPerSec=5.654148874633871, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:59:22,046] [INFO] [timer.py:197:stop] 0/830, RunningAvgSamplesPerSec=6.324261490328659, CurrSamplesPerSec=5.718737973915092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:59:33,412] [INFO] [timer.py:197:stop] 0/832, RunningAvgSamplesPerSec=6.324283135016285, CurrSamplesPerSec=5.693500069483766, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:59:44,762] [INFO] [timer.py:197:stop] 0/834, RunningAvgSamplesPerSec=6.3243167430060785, CurrSamplesPerSec=5.695427790856719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 20:59:56,411] [INFO] [timer.py:197:stop] 0/836, RunningAvgSamplesPerSec=6.3243108942685415, CurrSamplesPerSec=5.677710568918598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:00:08,280] [INFO] [timer.py:197:stop] 0/838, RunningAvgSamplesPerSec=6.324302607187967, CurrSamplesPerSec=5.6794915808427024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:00:20,196] [INFO] [logging.py:68:log_dist] [Rank 0] step=420, skipped=5, lr=[9.700174853763023e-06], mom=[[0.9, 0.999]] [2022-12-16 21:00:20,197] [INFO] [timer.py:197:stop] 0/840, RunningAvgSamplesPerSec=6.324301713290257, CurrSamplesPerSec=5.685020234855203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:00:32,195] [INFO] [timer.py:197:stop] 0/842, RunningAvgSamplesPerSec=6.324186164489815, CurrSamplesPerSec=5.6956958278894705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:00:43,595] [INFO] [timer.py:197:stop] 0/844, RunningAvgSamplesPerSec=6.324133462309744, CurrSamplesPerSec=5.6358783857465875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:00:55,109] [INFO] [timer.py:197:stop] 0/846, RunningAvgSamplesPerSec=6.324170851164534, CurrSamplesPerSec=5.717252256195339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:01:06,656] [INFO] [timer.py:197:stop] 0/848, RunningAvgSamplesPerSec=6.324132295042807, CurrSamplesPerSec=5.647554883343113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:01:18,191] [INFO] [timer.py:197:stop] 0/850, RunningAvgSamplesPerSec=6.323868844928134, CurrSamplesPerSec=5.5051387907455664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0675, 'learning_rate': 9.719445885591654e-06, 'epoch': 1.8} [2022-12-16 21:01:29,778] [INFO] [timer.py:197:stop] 0/852, RunningAvgSamplesPerSec=6.323895984034452, CurrSamplesPerSec=5.701067683378528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:01:41,155] [INFO] [timer.py:197:stop] 0/854, RunningAvgSamplesPerSec=6.323866025614896, CurrSamplesPerSec=5.680294398450153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:01:52,629] [INFO] [timer.py:197:stop] 0/856, RunningAvgSamplesPerSec=6.323646641240253, CurrSamplesPerSec=5.551309222463983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:02:03,977] [INFO] [timer.py:197:stop] 0/858, RunningAvgSamplesPerSec=6.323661475227139, CurrSamplesPerSec=5.7159530384475685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:02:15,298] [INFO] [logging.py:68:log_dist] [Rank 0] step=430, skipped=5, lr=[9.738488852516646e-06], mom=[[0.9, 0.999]] [2022-12-16 21:02:15,300] [INFO] [timer.py:197:stop] 0/860, RunningAvgSamplesPerSec=6.323710149257949, CurrSamplesPerSec=5.719581661471569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:02:26,710] [INFO] [timer.py:197:stop] 0/862, RunningAvgSamplesPerSec=6.323634804344302, CurrSamplesPerSec=5.62526862139189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:02:38,037] [INFO] [timer.py:197:stop] 0/864, RunningAvgSamplesPerSec=6.323649281830787, CurrSamplesPerSec=5.681464178147183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:02:49,370] [INFO] [timer.py:197:stop] 0/866, RunningAvgSamplesPerSec=6.3236838467673975, CurrSamplesPerSec=5.705553350483724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:00,794] [INFO] [timer.py:197:stop] 0/868, RunningAvgSamplesPerSec=6.323587287957588, CurrSamplesPerSec=5.591940037322728, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:12,130] [INFO] [timer.py:197:stop] 0/870, RunningAvgSamplesPerSec=6.323606237462717, CurrSamplesPerSec=5.7091401213955075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:23,687] [INFO] [timer.py:197:stop] 0/872, RunningAvgSamplesPerSec=6.323648030461905, CurrSamplesPerSec=5.714125248491172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:35,327] [INFO] [timer.py:197:stop] 0/874, RunningAvgSamplesPerSec=6.32362972848686, CurrSamplesPerSec=5.665148674893301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:46,681] [INFO] [timer.py:197:stop] 0/876, RunningAvgSamplesPerSec=6.323635151245853, CurrSamplesPerSec=5.690080062248546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:03:58,094] [INFO] [timer.py:197:stop] 0/878, RunningAvgSamplesPerSec=6.323670798847649, CurrSamplesPerSec=5.71115840184258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:04:09,744] [INFO] [logging.py:68:log_dist] [Rank 0] step=440, skipped=5, lr=[9.775911746761854e-06], mom=[[0.9, 0.999]] [2022-12-16 21:04:09,746] [INFO] [timer.py:197:stop] 0/880, RunningAvgSamplesPerSec=6.323698927835154, CurrSamplesPerSec=5.699957111577041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:04:21,036] [INFO] [timer.py:197:stop] 0/882, RunningAvgSamplesPerSec=6.323752552261068, CurrSamplesPerSec=5.719672819971721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:04:32,626] [INFO] [timer.py:197:stop] 0/884, RunningAvgSamplesPerSec=6.323770566065425, CurrSamplesPerSec=5.690295486589126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:04:44,097] [INFO] [timer.py:197:stop] 0/886, RunningAvgSamplesPerSec=6.323742611986993, CurrSamplesPerSec=5.6883239854983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:04:55,411] [INFO] [timer.py:197:stop] 0/888, RunningAvgSamplesPerSec=6.323806743121909, CurrSamplesPerSec=5.707230045830403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:05:06,778] [INFO] [timer.py:197:stop] 0/890, RunningAvgSamplesPerSec=6.323798891507006, CurrSamplesPerSec=5.698139293801706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:05:18,153] [INFO] [timer.py:197:stop] 0/892, RunningAvgSamplesPerSec=6.323797970601452, CurrSamplesPerSec=5.689324155599842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:05:29,477] [INFO] [timer.py:197:stop] 0/894, RunningAvgSamplesPerSec=6.323804280166466, CurrSamplesPerSec=5.693873238627004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:05:40,811] [INFO] [timer.py:197:stop] 0/896, RunningAvgSamplesPerSec=6.32381869024255, CurrSamplesPerSec=5.687272353769752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:05:52,541] [INFO] [timer.py:197:stop] 0/898, RunningAvgSamplesPerSec=6.323304078211752, CurrSamplesPerSec=5.66975376791339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:06:03,859] [INFO] [logging.py:68:log_dist] [Rank 0] step=450, skipped=5, lr=[9.812484046603779e-06], mom=[[0.9, 0.999]] [2022-12-16 21:06:03,860] [INFO] [timer.py:197:stop] 0/900, RunningAvgSamplesPerSec=6.323361252627297, CurrSamplesPerSec=5.696158486770488, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0662, 'learning_rate': 9.812484046603779e-06, 'epoch': 1.91} [2022-12-16 21:06:15,202] [INFO] [timer.py:197:stop] 0/902, RunningAvgSamplesPerSec=6.323392772084618, CurrSamplesPerSec=5.693829035456161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:06:26,792] [INFO] [timer.py:197:stop] 0/904, RunningAvgSamplesPerSec=6.323328765368771, CurrSamplesPerSec=5.675209477210574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:06:38,301] [INFO] [timer.py:197:stop] 0/906, RunningAvgSamplesPerSec=6.323365724637102, CurrSamplesPerSec=5.71127262260955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:06:49,669] [INFO] [timer.py:197:stop] 0/908, RunningAvgSamplesPerSec=6.323331986885105, CurrSamplesPerSec=5.659760969514658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:01,224] [INFO] [timer.py:197:stop] 0/910, RunningAvgSamplesPerSec=6.323368369110442, CurrSamplesPerSec=5.691273423843159, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:12,732] [INFO] [timer.py:197:stop] 0/912, RunningAvgSamplesPerSec=6.3233877302253045, CurrSamplesPerSec=5.696535631113576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:24,152] [INFO] [timer.py:197:stop] 0/914, RunningAvgSamplesPerSec=6.323333379923915, CurrSamplesPerSec=5.6425592209689155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:35,594] [INFO] [timer.py:197:stop] 0/916, RunningAvgSamplesPerSec=6.323348683822525, CurrSamplesPerSec=5.708248043439559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:47,094] [INFO] [timer.py:197:stop] 0/918, RunningAvgSamplesPerSec=6.323396275366316, CurrSamplesPerSec=5.695279160815137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:07:58,467] [INFO] [logging.py:68:log_dist] [Rank 0] step=460, skipped=5, lr=[9.84824356101363e-06], mom=[[0.9, 0.999]] [2022-12-16 21:07:58,468] [INFO] [timer.py:197:stop] 0/920, RunningAvgSamplesPerSec=6.323380406432734, CurrSamplesPerSec=5.670665239188501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:08:10,097] [INFO] [timer.py:197:stop] 0/922, RunningAvgSamplesPerSec=6.323363664915916, CurrSamplesPerSec=5.695054176276154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:08:21,496] [INFO] [timer.py:197:stop] 0/924, RunningAvgSamplesPerSec=6.323303982435558, CurrSamplesPerSec=5.652892698429706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:08:32,970] [INFO] [timer.py:197:stop] 0/926, RunningAvgSamplesPerSec=6.323196876502637, CurrSamplesPerSec=5.609877765122621, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:08:44,327] [INFO] [timer.py:197:stop] 0/928, RunningAvgSamplesPerSec=6.3232058943138485, CurrSamplesPerSec=5.68537928839975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:08:55,686] [INFO] [timer.py:197:stop] 0/930, RunningAvgSamplesPerSec=6.323205497567241, CurrSamplesPerSec=5.706007909046854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:09:07,353] [INFO] [timer.py:197:stop] 0/932, RunningAvgSamplesPerSec=6.322831418899166, CurrSamplesPerSec=5.399923920213184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:09:18,735] [INFO] [timer.py:197:stop] 0/934, RunningAvgSamplesPerSec=6.32280846623546, CurrSamplesPerSec=5.669265454860022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:09:30,257] [INFO] [timer.py:197:stop] 0/936, RunningAvgSamplesPerSec=6.322880930092381, CurrSamplesPerSec=5.72270659354622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:09:41,888] [INFO] [timer.py:197:stop] 0/938, RunningAvgSamplesPerSec=6.322911512653137, CurrSamplesPerSec=5.6971461789274604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:09:53,289] [INFO] [logging.py:68:log_dist] [Rank 0] step=470, skipped=5, lr=[9.883225632758308e-06], mom=[[0.9, 0.999]] [2022-12-16 21:09:53,292] [INFO] [timer.py:197:stop] 0/940, RunningAvgSamplesPerSec=6.322874284590231, CurrSamplesPerSec=5.65775308872727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:10:04,647] [INFO] [timer.py:197:stop] 0/942, RunningAvgSamplesPerSec=6.322877183552834, CurrSamplesPerSec=5.6954686352675905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:10:13,197] [INFO] [timer.py:197:stop] 0/944, RunningAvgSamplesPerSec=6.326150664923261, CurrSamplesPerSec=10.148292026847534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:10:24,559] [INFO] [timer.py:197:stop] 0/946, RunningAvgSamplesPerSec=6.326127489531616, CurrSamplesPerSec=5.6568171532502465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:10:35,885] [INFO] [timer.py:197:stop] 0/948, RunningAvgSamplesPerSec=6.3261891603196, CurrSamplesPerSec=5.717331163249521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:10:47,312] [INFO] [timer.py:197:stop] 0/950, RunningAvgSamplesPerSec=6.32620549424659, CurrSamplesPerSec=5.708362390603286, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0615, 'learning_rate': 9.900435550016748e-06, 'epoch': 2.01} [2022-12-16 21:10:58,668] [INFO] [timer.py:197:stop] 0/952, RunningAvgSamplesPerSec=6.326192318553131, CurrSamplesPerSec=5.658891651344986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:11:10,010] [INFO] [timer.py:197:stop] 0/954, RunningAvgSamplesPerSec=6.326204475079454, CurrSamplesPerSec=5.68568973408932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:11:21,376] [INFO] [timer.py:197:stop] 0/956, RunningAvgSamplesPerSec=6.326191030150801, CurrSamplesPerSec=5.6838889469060225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:11:32,728] [INFO] [timer.py:197:stop] 0/958, RunningAvgSamplesPerSec=6.326197330889036, CurrSamplesPerSec=5.6822468649952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:11:44,292] [INFO] [logging.py:68:log_dist] [Rank 0] step=480, skipped=5, lr=[9.917463348331534e-06], mom=[[0.9, 0.999]] [2022-12-16 21:11:44,298] [INFO] [timer.py:197:stop] 0/960, RunningAvgSamplesPerSec=6.3259327540553665, CurrSamplesPerSec=5.469384964136454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:11:55,877] [INFO] [timer.py:197:stop] 0/962, RunningAvgSamplesPerSec=6.325935854692364, CurrSamplesPerSec=5.712130639515811, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:12:07,434] [INFO] [timer.py:197:stop] 0/964, RunningAvgSamplesPerSec=6.325921725199111, CurrSamplesPerSec=5.700722626182441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:12:18,923] [INFO] [timer.py:197:stop] 0/966, RunningAvgSamplesPerSec=6.325759514089763, CurrSamplesPerSec=5.568770311968856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:12:30,325] [INFO] [timer.py:197:stop] 0/968, RunningAvgSamplesPerSec=6.325740953625998, CurrSamplesPerSec=5.680124922543402, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:12:41,706] [INFO] [timer.py:197:stop] 0/970, RunningAvgSamplesPerSec=6.325739440815068, CurrSamplesPerSec=5.713497922489823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:12:53,334] [INFO] [timer.py:197:stop] 0/972, RunningAvgSamplesPerSec=6.325375565220704, CurrSamplesPerSec=5.402155811404497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:13:04,701] [INFO] [timer.py:197:stop] 0/974, RunningAvgSamplesPerSec=6.3253512622784145, CurrSamplesPerSec=5.666073017760117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:13:16,312] [INFO] [timer.py:197:stop] 0/976, RunningAvgSamplesPerSec=6.325380050460924, CurrSamplesPerSec=5.706268694383452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:13:28,013] [INFO] [timer.py:197:stop] 0/978, RunningAvgSamplesPerSec=6.325435711327527, CurrSamplesPerSec=5.72600691245085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:13:39,351] [INFO] [logging.py:68:log_dist] [Rank 0] step=490, skipped=5, lr=[9.950987726012135e-06], mom=[[0.9, 0.999]] [2022-12-16 21:13:39,353] [INFO] [timer.py:197:stop] 0/980, RunningAvgSamplesPerSec=6.325457773977271, CurrSamplesPerSec=5.694198382492763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:13:50,967] [INFO] [timer.py:197:stop] 0/982, RunningAvgSamplesPerSec=6.325489117445667, CurrSamplesPerSec=5.702591761786533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:02,677] [INFO] [timer.py:197:stop] 0/984, RunningAvgSamplesPerSec=6.325408873295665, CurrSamplesPerSec=5.6763934878966475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:14,003] [INFO] [timer.py:197:stop] 0/986, RunningAvgSamplesPerSec=6.325441453698785, CurrSamplesPerSec=5.705198777174381, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:25,351] [INFO] [timer.py:197:stop] 0/988, RunningAvgSamplesPerSec=6.325463222633959, CurrSamplesPerSec=5.6933520230469705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:36,810] [INFO] [timer.py:197:stop] 0/990, RunningAvgSamplesPerSec=6.325347026707095, CurrSamplesPerSec=5.7087785461304295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:48,122] [INFO] [timer.py:197:stop] 0/992, RunningAvgSamplesPerSec=6.3254064143464, CurrSamplesPerSec=5.71673381308937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:14:59,654] [INFO] [timer.py:197:stop] 0/994, RunningAvgSamplesPerSec=6.325195870913208, CurrSamplesPerSec=5.491621718647289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:15:11,087] [INFO] [timer.py:197:stop] 0/996, RunningAvgSamplesPerSec=6.325231062084005, CurrSamplesPerSec=5.711203360535411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:15:22,418] [INFO] [timer.py:197:stop] 0/998, RunningAvgSamplesPerSec=6.3252868808302996, CurrSamplesPerSec=5.712874868002988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:15:33,989] [INFO] [logging.py:68:log_dist] [Rank 0] step=500, skipped=5, lr=[9.98382788472848e-06], mom=[[0.9, 0.999]] [2022-12-16 21:15:33,991] [INFO] [timer.py:197:stop] 0/1000, RunningAvgSamplesPerSec=6.325035740808991, CurrSamplesPerSec=5.474089503245498, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0323, 'learning_rate': 9.98382788472848e-06, 'epoch': 2.12} [2022-12-16 21:15:45,433] [INFO] [timer.py:197:stop] 0/1002, RunningAvgSamplesPerSec=6.325084596844273, CurrSamplesPerSec=5.713662342040977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:15:56,836] [INFO] [timer.py:197:stop] 0/1004, RunningAvgSamplesPerSec=6.3251158291307386, CurrSamplesPerSec=5.713022229276716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:16:08,258] [INFO] [timer.py:197:stop] 0/1006, RunningAvgSamplesPerSec=6.325055351476069, CurrSamplesPerSec=5.623066744820601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:16:19,582] [INFO] [timer.py:197:stop] 0/1008, RunningAvgSamplesPerSec=6.325102609875333, CurrSamplesPerSec=5.707804050857386, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:16:30,974] [INFO] [timer.py:197:stop] 0/1010, RunningAvgSamplesPerSec=6.325160154746798, CurrSamplesPerSec=5.720153278154641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:16:42,337] [INFO] [timer.py:197:stop] 0/1012, RunningAvgSamplesPerSec=6.32517427631391, CurrSamplesPerSec=5.688713354181847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:16:53,934] [INFO] [timer.py:197:stop] 0/1014, RunningAvgSamplesPerSec=6.325238765075366, CurrSamplesPerSec=5.72578779867803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:17:05,268] [INFO] [timer.py:197:stop] 0/1016, RunningAvgSamplesPerSec=6.325275504519772, CurrSamplesPerSec=5.704129988366727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:17:16,885] [INFO] [timer.py:197:stop] 0/1018, RunningAvgSamplesPerSec=6.324959984011779, CurrSamplesPerSec=5.429061828725207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:17:28,234] [INFO] [logging.py:68:log_dist] [Rank 0] step=510, skipped=5, lr=[9.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-16 21:17:28,236] [INFO] [timer.py:197:stop] 0/1020, RunningAvgSamplesPerSec=6.3249907983214415, CurrSamplesPerSec=5.71215640831638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:17:39,550] [INFO] [timer.py:197:stop] 0/1022, RunningAvgSamplesPerSec=6.325041232963377, CurrSamplesPerSec=5.714046186639779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:17:51,113] [INFO] [timer.py:197:stop] 0/1024, RunningAvgSamplesPerSec=6.32477933640267, CurrSamplesPerSec=5.468682765885077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:02,491] [INFO] [timer.py:197:stop] 0/1026, RunningAvgSamplesPerSec=6.324800913545075, CurrSamplesPerSec=5.684177323585509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:13,861] [INFO] [timer.py:197:stop] 0/1028, RunningAvgSamplesPerSec=6.324787956456297, CurrSamplesPerSec=5.6712122626343895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:25,298] [INFO] [timer.py:197:stop] 0/1030, RunningAvgSamplesPerSec=6.324702766601665, CurrSamplesPerSec=5.589238563916994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:36,685] [INFO] [timer.py:197:stop] 0/1032, RunningAvgSamplesPerSec=6.324675894419773, CurrSamplesPerSec=5.675779219192053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:48,075] [INFO] [timer.py:197:stop] 0/1034, RunningAvgSamplesPerSec=6.324696787274121, CurrSamplesPerSec=5.694624797553552, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:18:59,489] [INFO] [timer.py:197:stop] 0/1036, RunningAvgSamplesPerSec=6.324615631271562, CurrSamplesPerSec=5.6163487641731304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:19:10,966] [INFO] [timer.py:197:stop] 0/1038, RunningAvgSamplesPerSec=6.32453266339884, CurrSamplesPerSec=5.621838704111991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:19:22,518] [INFO] [logging.py:68:log_dist] [Rank 0] step=520, skipped=5, lr=[9.96888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 21:19:22,519] [INFO] [timer.py:197:stop] 0/1040, RunningAvgSamplesPerSec=6.324513595083433, CurrSamplesPerSec=5.658441467170767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:19:33,943] [INFO] [timer.py:197:stop] 0/1042, RunningAvgSamplesPerSec=6.324515006176603, CurrSamplesPerSec=5.7085328273627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:19:45,299] [INFO] [timer.py:197:stop] 0/1044, RunningAvgSamplesPerSec=6.324501904454941, CurrSamplesPerSec=5.677590721690172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:19:56,828] [INFO] [timer.py:197:stop] 0/1046, RunningAvgSamplesPerSec=6.324538332751554, CurrSamplesPerSec=5.715457708942437, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:20:08,266] [INFO] [timer.py:197:stop] 0/1048, RunningAvgSamplesPerSec=6.324571637922376, CurrSamplesPerSec=5.7061095518455724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:20:19,629] [INFO] [timer.py:197:stop] 0/1050, RunningAvgSamplesPerSec=6.324574801815149, CurrSamplesPerSec=5.683213133585038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0333, 'learning_rate': 9.957777777777779e-06, 'epoch': 2.22} [2022-12-16 21:20:30,987] [INFO] [timer.py:197:stop] 0/1052, RunningAvgSamplesPerSec=6.324583864783749, CurrSamplesPerSec=5.699014179541857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:20:42,360] [INFO] [timer.py:197:stop] 0/1054, RunningAvgSamplesPerSec=6.324593248570674, CurrSamplesPerSec=5.691380817152285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:20:53,737] [INFO] [timer.py:197:stop] 0/1056, RunningAvgSamplesPerSec=6.32459481413206, CurrSamplesPerSec=5.6858901328649205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:21:05,096] [INFO] [timer.py:197:stop] 0/1058, RunningAvgSamplesPerSec=6.324578587579753, CurrSamplesPerSec=5.666338298872446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:21:16,463] [INFO] [logging.py:68:log_dist] [Rank 0] step=530, skipped=5, lr=[9.946666666666667e-06], mom=[[0.9, 0.999]] [2022-12-16 21:21:16,465] [INFO] [timer.py:197:stop] 0/1060, RunningAvgSamplesPerSec=6.324570744039958, CurrSamplesPerSec=5.681243890929784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:21:27,793] [INFO] [timer.py:197:stop] 0/1062, RunningAvgSamplesPerSec=6.324617494273643, CurrSamplesPerSec=5.72019130873348, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:21:39,228] [INFO] [timer.py:197:stop] 0/1064, RunningAvgSamplesPerSec=6.324530287745588, CurrSamplesPerSec=5.622556998055028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:21:50,614] [INFO] [timer.py:197:stop] 0/1066, RunningAvgSamplesPerSec=6.324571372955135, CurrSamplesPerSec=5.71121672676978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:02,217] [INFO] [timer.py:197:stop] 0/1068, RunningAvgSamplesPerSec=6.324523452438587, CurrSamplesPerSec=5.70700047614419, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:13,577] [INFO] [timer.py:197:stop] 0/1070, RunningAvgSamplesPerSec=6.324533424174116, CurrSamplesPerSec=5.6957316002942715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:25,169] [INFO] [timer.py:197:stop] 0/1072, RunningAvgSamplesPerSec=6.324543786478323, CurrSamplesPerSec=5.690434930076675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:36,671] [INFO] [timer.py:197:stop] 0/1074, RunningAvgSamplesPerSec=6.324471864626404, CurrSamplesPerSec=5.684128937809232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:48,004] [INFO] [timer.py:197:stop] 0/1076, RunningAvgSamplesPerSec=6.324501261828952, CurrSamplesPerSec=5.704121018830548, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:22:59,356] [INFO] [timer.py:197:stop] 0/1078, RunningAvgSamplesPerSec=6.3244863543294665, CurrSamplesPerSec=5.678301472905768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:23:10,753] [INFO] [logging.py:68:log_dist] [Rank 0] step=540, skipped=5, lr=[9.924444444444445e-06], mom=[[0.9, 0.999]] [2022-12-16 21:23:10,755] [INFO] [timer.py:197:stop] 0/1080, RunningAvgSamplesPerSec=6.324440605807113, CurrSamplesPerSec=5.695234694147735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:23:22,089] [INFO] [timer.py:197:stop] 0/1082, RunningAvgSamplesPerSec=6.324469069745208, CurrSamplesPerSec=5.685915906328817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:23:33,429] [INFO] [timer.py:197:stop] 0/1084, RunningAvgSamplesPerSec=6.324481265006702, CurrSamplesPerSec=5.698475086538935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:23:44,770] [INFO] [timer.py:197:stop] 0/1086, RunningAvgSamplesPerSec=6.3245014505218595, CurrSamplesPerSec=5.715240861964207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:23:56,096] [INFO] [timer.py:197:stop] 0/1088, RunningAvgSamplesPerSec=6.324539515791824, CurrSamplesPerSec=5.726388508460027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:24:07,404] [INFO] [timer.py:197:stop] 0/1090, RunningAvgSamplesPerSec=6.324577731653262, CurrSamplesPerSec=5.710771543316395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:24:19,122] [INFO] [timer.py:197:stop] 0/1092, RunningAvgSamplesPerSec=6.324580708869267, CurrSamplesPerSec=5.683601562199607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:24:30,678] [INFO] [timer.py:197:stop] 0/1094, RunningAvgSamplesPerSec=6.324609515934535, CurrSamplesPerSec=5.708574831147178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:24:42,259] [INFO] [timer.py:197:stop] 0/1096, RunningAvgSamplesPerSec=6.32433704712248, CurrSamplesPerSec=5.450139213985013, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:24:53,756] [INFO] [timer.py:197:stop] 0/1098, RunningAvgSamplesPerSec=6.3243561501135, CurrSamplesPerSec=5.696628716087094, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:25:05,322] [INFO] [logging.py:68:log_dist] [Rank 0] step=550, skipped=5, lr=[9.902222222222223e-06], mom=[[0.9, 0.999]] [2022-12-16 21:25:05,323] [INFO] [timer.py:197:stop] 0/1100, RunningAvgSamplesPerSec=6.324366355272438, CurrSamplesPerSec=5.685196504972429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0356, 'learning_rate': 9.902222222222223e-06, 'epoch': 2.33} [2022-12-16 21:25:16,754] [INFO] [timer.py:197:stop] 0/1102, RunningAvgSamplesPerSec=6.324272647895259, CurrSamplesPerSec=5.596251823753526, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:25:28,285] [INFO] [timer.py:197:stop] 0/1104, RunningAvgSamplesPerSec=6.3242917552540545, CurrSamplesPerSec=5.701883879606733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:25:39,645] [INFO] [timer.py:197:stop] 0/1106, RunningAvgSamplesPerSec=6.324275104668984, CurrSamplesPerSec=5.686045740695403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:25:51,097] [INFO] [timer.py:197:stop] 0/1108, RunningAvgSamplesPerSec=6.3241656983143075, CurrSamplesPerSec=5.577740987465537, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:02,481] [INFO] [timer.py:197:stop] 0/1110, RunningAvgSamplesPerSec=6.324114079876728, CurrSamplesPerSec=5.626380230354328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:13,901] [INFO] [timer.py:197:stop] 0/1112, RunningAvgSamplesPerSec=6.324040600499698, CurrSamplesPerSec=5.613003653490731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:25,478] [INFO] [timer.py:197:stop] 0/1114, RunningAvgSamplesPerSec=6.323775004653686, CurrSamplesPerSec=5.447233828195022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:36,786] [INFO] [timer.py:197:stop] 0/1116, RunningAvgSamplesPerSec=6.3238317786318134, CurrSamplesPerSec=5.725627077052026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:48,119] [INFO] [timer.py:197:stop] 0/1118, RunningAvgSamplesPerSec=6.323858000238239, CurrSamplesPerSec=5.6966565212796345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:26:59,437] [INFO] [logging.py:68:log_dist] [Rank 0] step=560, skipped=5, lr=[9.88e-06], mom=[[0.9, 0.999]] [2022-12-16 21:26:59,438] [INFO] [timer.py:197:stop] 0/1120, RunningAvgSamplesPerSec=6.323885458404519, CurrSamplesPerSec=5.6730858326379225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:27:10,785] [INFO] [timer.py:197:stop] 0/1122, RunningAvgSamplesPerSec=6.323900620387515, CurrSamplesPerSec=5.690677163038752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:27:22,351] [INFO] [timer.py:197:stop] 0/1124, RunningAvgSamplesPerSec=6.323915903130612, CurrSamplesPerSec=5.696965539863474, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:27:34,000] [INFO] [timer.py:197:stop] 0/1126, RunningAvgSamplesPerSec=6.323864250923531, CurrSamplesPerSec=5.662085075555017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:27:45,372] [INFO] [timer.py:197:stop] 0/1128, RunningAvgSamplesPerSec=6.323853592054257, CurrSamplesPerSec=5.660074352251678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:27:56,729] [INFO] [timer.py:197:stop] 0/1130, RunningAvgSamplesPerSec=6.323836945601488, CurrSamplesPerSec=5.681051033395681, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:28:08,190] [INFO] [timer.py:197:stop] 0/1132, RunningAvgSamplesPerSec=6.323840085148342, CurrSamplesPerSec=5.6790268195333224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:28:19,494] [INFO] [timer.py:197:stop] 0/1134, RunningAvgSamplesPerSec=6.323901462456079, CurrSamplesPerSec=5.720255425533773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:28:30,806] [INFO] [timer.py:197:stop] 0/1136, RunningAvgSamplesPerSec=6.323955250686382, CurrSamplesPerSec=5.715921149785817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:28:42,142] [INFO] [timer.py:197:stop] 0/1138, RunningAvgSamplesPerSec=6.323982101549626, CurrSamplesPerSec=5.718288206433784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:28:53,449] [INFO] [logging.py:68:log_dist] [Rank 0] step=570, skipped=5, lr=[9.857777777777778e-06], mom=[[0.9, 0.999]] [2022-12-16 21:28:53,450] [INFO] [timer.py:197:stop] 0/1140, RunningAvgSamplesPerSec=6.324037655604661, CurrSamplesPerSec=5.70284278480875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:29:04,783] [INFO] [timer.py:197:stop] 0/1142, RunningAvgSamplesPerSec=6.324054750381438, CurrSamplesPerSec=5.6922710102938465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:29:16,157] [INFO] [timer.py:197:stop] 0/1144, RunningAvgSamplesPerSec=6.324025693628083, CurrSamplesPerSec=5.708841678873624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:29:27,507] [INFO] [timer.py:197:stop] 0/1146, RunningAvgSamplesPerSec=6.324022208813268, CurrSamplesPerSec=5.691424016952267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:29:38,855] [INFO] [timer.py:197:stop] 0/1148, RunningAvgSamplesPerSec=6.32402470004462, CurrSamplesPerSec=5.684709862270304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:29:50,263] [INFO] [timer.py:197:stop] 0/1150, RunningAvgSamplesPerSec=6.324047398655615, CurrSamplesPerSec=5.70717083158569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0348, 'learning_rate': 9.846666666666668e-06, 'epoch': 2.44} [2022-12-16 21:30:01,599] [INFO] [timer.py:197:stop] 0/1152, RunningAvgSamplesPerSec=6.324083612239293, CurrSamplesPerSec=5.733073456941207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:30:13,060] [INFO] [timer.py:197:stop] 0/1154, RunningAvgSamplesPerSec=6.32398511164039, CurrSamplesPerSec=5.608629462518138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:30:24,764] [INFO] [timer.py:197:stop] 0/1156, RunningAvgSamplesPerSec=6.32395490744807, CurrSamplesPerSec=5.665791976052485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:30:36,281] [INFO] [timer.py:197:stop] 0/1158, RunningAvgSamplesPerSec=6.323960705938503, CurrSamplesPerSec=5.7071327312664195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:30:47,832] [INFO] [logging.py:68:log_dist] [Rank 0] step=580, skipped=5, lr=[9.835555555555556e-06], mom=[[0.9, 0.999]] [2022-12-16 21:30:47,832] [INFO] [timer.py:197:stop] 0/1160, RunningAvgSamplesPerSec=6.323719233865647, CurrSamplesPerSec=5.472872328800012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:30:59,391] [INFO] [timer.py:197:stop] 0/1162, RunningAvgSamplesPerSec=6.32370184523637, CurrSamplesPerSec=5.674197471333234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:31:10,850] [INFO] [timer.py:197:stop] 0/1164, RunningAvgSamplesPerSec=6.323718795363503, CurrSamplesPerSec=5.697172538195727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:31:22,211] [INFO] [timer.py:197:stop] 0/1166, RunningAvgSamplesPerSec=6.3237189852523725, CurrSamplesPerSec=5.701931114731109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:31:33,559] [INFO] [timer.py:197:stop] 0/1168, RunningAvgSamplesPerSec=6.323734374802737, CurrSamplesPerSec=5.700105501726921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:31:44,892] [INFO] [timer.py:197:stop] 0/1170, RunningAvgSamplesPerSec=6.323763582374604, CurrSamplesPerSec=5.704374116034424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:31:56,244] [INFO] [timer.py:197:stop] 0/1172, RunningAvgSamplesPerSec=6.32377243788173, CurrSamplesPerSec=5.694551589747411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:32:07,599] [INFO] [timer.py:197:stop] 0/1174, RunningAvgSamplesPerSec=6.323779297278022, CurrSamplesPerSec=5.6859893740268745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:32:18,933] [INFO] [timer.py:197:stop] 0/1176, RunningAvgSamplesPerSec=6.323790989360503, CurrSamplesPerSec=5.69161975163729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:32:30,268] [INFO] [timer.py:197:stop] 0/1178, RunningAvgSamplesPerSec=6.323817303486766, CurrSamplesPerSec=5.726107314664906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:32:41,713] [INFO] [logging.py:68:log_dist] [Rank 0] step=590, skipped=5, lr=[9.813333333333333e-06], mom=[[0.9, 0.999]] [2022-12-16 21:32:41,714] [INFO] [timer.py:197:stop] 0/1180, RunningAvgSamplesPerSec=6.323843584883303, CurrSamplesPerSec=5.706581425626131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:32:53,026] [INFO] [timer.py:197:stop] 0/1182, RunningAvgSamplesPerSec=6.3238942954570305, CurrSamplesPerSec=5.729338187061064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:33:04,406] [INFO] [timer.py:197:stop] 0/1184, RunningAvgSamplesPerSec=6.323874179833749, CurrSamplesPerSec=5.657513889005977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:33:15,727] [INFO] [timer.py:197:stop] 0/1186, RunningAvgSamplesPerSec=6.323901512870132, CurrSamplesPerSec=5.709603024005327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:33:27,073] [INFO] [timer.py:197:stop] 0/1188, RunningAvgSamplesPerSec=6.323900985123022, CurrSamplesPerSec=5.709132107478294, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:33:38,449] [INFO] [timer.py:197:stop] 0/1190, RunningAvgSamplesPerSec=6.323886756213642, CurrSamplesPerSec=5.694773393442737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:33:49,761] [INFO] [timer.py:197:stop] 0/1192, RunningAvgSamplesPerSec=6.323938229239878, CurrSamplesPerSec=5.711202145426297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:34:01,125] [INFO] [timer.py:197:stop] 0/1194, RunningAvgSamplesPerSec=6.323934641391922, CurrSamplesPerSec=5.702443242161101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:34:12,448] [INFO] [timer.py:197:stop] 0/1196, RunningAvgSamplesPerSec=6.323959876909366, CurrSamplesPerSec=5.703602045869347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:34:23,812] [INFO] [timer.py:197:stop] 0/1198, RunningAvgSamplesPerSec=6.323975224149886, CurrSamplesPerSec=5.688260341444645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:34:35,164] [INFO] [logging.py:68:log_dist] [Rank 0] step=600, skipped=5, lr=[9.791111111111112e-06], mom=[[0.9, 0.999]] [2022-12-16 21:34:35,166] [INFO] [timer.py:197:stop] 0/1200, RunningAvgSamplesPerSec=6.323984429999333, CurrSamplesPerSec=5.689274958682965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.034, 'learning_rate': 9.791111111111112e-06, 'epoch': 2.54} [2022-12-16 21:34:46,518] [INFO] [timer.py:197:stop] 0/1202, RunningAvgSamplesPerSec=6.323997457450842, CurrSamplesPerSec=5.695443016811219, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:34:58,030] [INFO] [timer.py:197:stop] 0/1204, RunningAvgSamplesPerSec=6.324019030245103, CurrSamplesPerSec=5.679552865802275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:35:09,353] [INFO] [timer.py:197:stop] 0/1206, RunningAvgSamplesPerSec=6.324070278143984, CurrSamplesPerSec=5.712041909103797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:35:20,728] [INFO] [timer.py:197:stop] 0/1208, RunningAvgSamplesPerSec=6.324057885317574, CurrSamplesPerSec=5.679879741258521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:35:32,055] [INFO] [timer.py:197:stop] 0/1210, RunningAvgSamplesPerSec=6.324093491416535, CurrSamplesPerSec=5.708959449453373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:35:43,396] [INFO] [timer.py:197:stop] 0/1212, RunningAvgSamplesPerSec=6.324121293486445, CurrSamplesPerSec=5.702608722105414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:35:54,754] [INFO] [timer.py:197:stop] 0/1214, RunningAvgSamplesPerSec=6.32410811693974, CurrSamplesPerSec=5.677839068060557, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:36:06,166] [INFO] [timer.py:197:stop] 0/1216, RunningAvgSamplesPerSec=6.324056865254654, CurrSamplesPerSec=5.649525567777039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:36:17,527] [INFO] [timer.py:197:stop] 0/1218, RunningAvgSamplesPerSec=6.324057753210374, CurrSamplesPerSec=5.6874906986638285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:36:28,891] [INFO] [logging.py:68:log_dist] [Rank 0] step=610, skipped=5, lr=[9.76888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 21:36:28,892] [INFO] [timer.py:197:stop] 0/1220, RunningAvgSamplesPerSec=6.324054379245348, CurrSamplesPerSec=5.6906122599537055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:36:40,250] [INFO] [timer.py:197:stop] 0/1222, RunningAvgSamplesPerSec=6.324061368820376, CurrSamplesPerSec=5.700368410913226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:36:51,599] [INFO] [timer.py:197:stop] 0/1224, RunningAvgSamplesPerSec=6.324082612140597, CurrSamplesPerSec=5.709710138608091, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:02,945] [INFO] [timer.py:197:stop] 0/1226, RunningAvgSamplesPerSec=6.324099967940003, CurrSamplesPerSec=5.710087379174428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:14,312] [INFO] [timer.py:197:stop] 0/1228, RunningAvgSamplesPerSec=6.324095213948072, CurrSamplesPerSec=5.6951648538816695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:25,643] [INFO] [timer.py:197:stop] 0/1230, RunningAvgSamplesPerSec=6.3241268422499015, CurrSamplesPerSec=5.732377573527465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:36,998] [INFO] [timer.py:197:stop] 0/1232, RunningAvgSamplesPerSec=6.32411884613214, CurrSamplesPerSec=5.675400258437628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:48,363] [INFO] [timer.py:197:stop] 0/1234, RunningAvgSamplesPerSec=6.3241130252354365, CurrSamplesPerSec=5.674158610594783, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:37:59,689] [INFO] [timer.py:197:stop] 0/1236, RunningAvgSamplesPerSec=6.324133915811186, CurrSamplesPerSec=5.713030983651014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:38:11,060] [INFO] [timer.py:197:stop] 0/1238, RunningAvgSamplesPerSec=6.324125635080051, CurrSamplesPerSec=5.683048536732076, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:38:22,399] [INFO] [logging.py:68:log_dist] [Rank 0] step=620, skipped=5, lr=[9.746666666666668e-06], mom=[[0.9, 0.999]] [2022-12-16 21:38:22,401] [INFO] [timer.py:197:stop] 0/1240, RunningAvgSamplesPerSec=6.3241300316642866, CurrSamplesPerSec=5.6885821924612765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:38:33,727] [INFO] [timer.py:197:stop] 0/1242, RunningAvgSamplesPerSec=6.324153194541566, CurrSamplesPerSec=5.721168331625024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:38:45,078] [INFO] [timer.py:197:stop] 0/1244, RunningAvgSamplesPerSec=6.324166885839952, CurrSamplesPerSec=5.6995364324043365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:38:56,402] [INFO] [timer.py:197:stop] 0/1246, RunningAvgSamplesPerSec=6.324190757241607, CurrSamplesPerSec=5.701532184211459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:39:07,779] [INFO] [timer.py:197:stop] 0/1248, RunningAvgSamplesPerSec=6.324176492543799, CurrSamplesPerSec=5.685249002830177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:39:19,130] [INFO] [timer.py:197:stop] 0/1250, RunningAvgSamplesPerSec=6.3241874762463635, CurrSamplesPerSec=5.705175011182494, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0339, 'learning_rate': 9.735555555555556e-06, 'epoch': 2.65} [2022-12-16 21:39:30,528] [INFO] [timer.py:197:stop] 0/1252, RunningAvgSamplesPerSec=6.324153428015963, CurrSamplesPerSec=5.670633614253161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:39:41,877] [INFO] [timer.py:197:stop] 0/1254, RunningAvgSamplesPerSec=6.324166879634101, CurrSamplesPerSec=5.691319276670348, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:39:53,239] [INFO] [timer.py:197:stop] 0/1256, RunningAvgSamplesPerSec=6.324167247062276, CurrSamplesPerSec=5.681665722246253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:40:04,580] [INFO] [timer.py:197:stop] 0/1258, RunningAvgSamplesPerSec=6.324174890989224, CurrSamplesPerSec=5.681159003145008, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:40:15,934] [INFO] [logging.py:68:log_dist] [Rank 0] step=630, skipped=5, lr=[9.724444444444445e-06], mom=[[0.9, 0.999]] [2022-12-16 21:40:15,936] [INFO] [timer.py:197:stop] 0/1260, RunningAvgSamplesPerSec=6.324179951438752, CurrSamplesPerSec=5.6964870347127015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:40:27,272] [INFO] [timer.py:197:stop] 0/1262, RunningAvgSamplesPerSec=6.324214385657734, CurrSamplesPerSec=5.709893530081676, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:40:38,622] [INFO] [timer.py:197:stop] 0/1264, RunningAvgSamplesPerSec=6.324239302104727, CurrSamplesPerSec=5.69377396361336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:40:49,947] [INFO] [timer.py:197:stop] 0/1266, RunningAvgSamplesPerSec=6.324277602074355, CurrSamplesPerSec=5.720856194199715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:01,249] [INFO] [timer.py:197:stop] 0/1268, RunningAvgSamplesPerSec=6.324335850701666, CurrSamplesPerSec=5.734853355839019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:12,564] [INFO] [timer.py:197:stop] 0/1270, RunningAvgSamplesPerSec=6.324382124670813, CurrSamplesPerSec=5.71265311090979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:23,900] [INFO] [timer.py:197:stop] 0/1272, RunningAvgSamplesPerSec=6.3243761836890755, CurrSamplesPerSec=5.69152514083604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:35,201] [INFO] [timer.py:197:stop] 0/1274, RunningAvgSamplesPerSec=6.324422547556908, CurrSamplesPerSec=5.7167786161089635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:46,505] [INFO] [timer.py:197:stop] 0/1276, RunningAvgSamplesPerSec=6.324480723648808, CurrSamplesPerSec=5.724640229146444, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:41:57,832] [INFO] [timer.py:197:stop] 0/1278, RunningAvgSamplesPerSec=6.324515620440186, CurrSamplesPerSec=5.7136565044984415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:42:09,174] [INFO] [logging.py:68:log_dist] [Rank 0] step=640, skipped=5, lr=[9.702222222222223e-06], mom=[[0.9, 0.999]] [2022-12-16 21:42:09,176] [INFO] [timer.py:197:stop] 0/1280, RunningAvgSamplesPerSec=6.32453233632449, CurrSamplesPerSec=5.711460003166846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:42:20,519] [INFO] [timer.py:197:stop] 0/1282, RunningAvgSamplesPerSec=6.324589203102045, CurrSamplesPerSec=5.729852069808556, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:42:31,881] [INFO] [timer.py:197:stop] 0/1284, RunningAvgSamplesPerSec=6.324574959448107, CurrSamplesPerSec=5.683294472976667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:42:43,238] [INFO] [timer.py:197:stop] 0/1286, RunningAvgSamplesPerSec=6.324582465903206, CurrSamplesPerSec=5.690985533237665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:42:54,569] [INFO] [timer.py:197:stop] 0/1288, RunningAvgSamplesPerSec=6.3246143105854, CurrSamplesPerSec=5.701603391759544, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:43:06,039] [INFO] [timer.py:197:stop] 0/1290, RunningAvgSamplesPerSec=6.3245165663862775, CurrSamplesPerSec=5.573252155643943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:43:17,387] [INFO] [timer.py:197:stop] 0/1292, RunningAvgSamplesPerSec=6.324514180927693, CurrSamplesPerSec=5.684713233086082, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:43:28,763] [INFO] [timer.py:197:stop] 0/1294, RunningAvgSamplesPerSec=6.324465620148037, CurrSamplesPerSec=5.665202955258279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:43:40,109] [INFO] [timer.py:197:stop] 0/1296, RunningAvgSamplesPerSec=6.324476536227191, CurrSamplesPerSec=5.703485950413083, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:43:51,455] [INFO] [timer.py:197:stop] 0/1298, RunningAvgSamplesPerSec=6.32448733334034, CurrSamplesPerSec=5.699092583967394, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:44:02,827] [INFO] [logging.py:68:log_dist] [Rank 0] step=650, skipped=5, lr=[9.68e-06], mom=[[0.9, 0.999]] [2022-12-16 21:44:02,829] [INFO] [timer.py:197:stop] 0/1300, RunningAvgSamplesPerSec=6.324472640902777, CurrSamplesPerSec=5.674295345181151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0354, 'learning_rate': 9.68e-06, 'epoch': 2.75} [2022-12-16 21:44:14,174] [INFO] [timer.py:197:stop] 0/1302, RunningAvgSamplesPerSec=6.324469450338527, CurrSamplesPerSec=5.7018911464980055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:44:25,494] [INFO] [timer.py:197:stop] 0/1304, RunningAvgSamplesPerSec=6.324474296992287, CurrSamplesPerSec=5.684456580977391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:44:36,890] [INFO] [timer.py:197:stop] 0/1306, RunningAvgSamplesPerSec=6.324445030815587, CurrSamplesPerSec=5.6593521681943075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:44:48,254] [INFO] [timer.py:197:stop] 0/1308, RunningAvgSamplesPerSec=6.3244409931410575, CurrSamplesPerSec=5.66902528072435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:44:59,617] [INFO] [timer.py:197:stop] 0/1310, RunningAvgSamplesPerSec=6.324437449468802, CurrSamplesPerSec=5.692400893631308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:45:10,999] [INFO] [timer.py:197:stop] 0/1312, RunningAvgSamplesPerSec=6.324450667657297, CurrSamplesPerSec=5.703528849436146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:45:22,350] [INFO] [timer.py:197:stop] 0/1314, RunningAvgSamplesPerSec=6.324459340859516, CurrSamplesPerSec=5.700481716272428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:45:33,715] [INFO] [timer.py:197:stop] 0/1316, RunningAvgSamplesPerSec=6.324453866475892, CurrSamplesPerSec=5.684826157724787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:45:45,083] [INFO] [timer.py:197:stop] 0/1318, RunningAvgSamplesPerSec=6.324431190376394, CurrSamplesPerSec=5.692041676866035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:45:56,680] [INFO] [logging.py:68:log_dist] [Rank 0] step=660, skipped=5, lr=[9.657777777777778e-06], mom=[[0.9, 0.999]] [2022-12-16 21:45:56,682] [INFO] [timer.py:197:stop] 0/1320, RunningAvgSamplesPerSec=6.32442696063329, CurrSamplesPerSec=5.687795349133654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:46:08,055] [INFO] [timer.py:197:stop] 0/1322, RunningAvgSamplesPerSec=6.324400513545528, CurrSamplesPerSec=5.6830059452395725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:46:19,405] [INFO] [timer.py:197:stop] 0/1324, RunningAvgSamplesPerSec=6.324409371583161, CurrSamplesPerSec=5.70627475944235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:46:30,768] [INFO] [timer.py:197:stop] 0/1326, RunningAvgSamplesPerSec=6.3243939378709815, CurrSamplesPerSec=5.6951795951100905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:46:42,121] [INFO] [timer.py:197:stop] 0/1328, RunningAvgSamplesPerSec=6.324402324882206, CurrSamplesPerSec=5.699022649051794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:46:53,470] [INFO] [timer.py:197:stop] 0/1330, RunningAvgSamplesPerSec=6.324400154231076, CurrSamplesPerSec=5.6892438493070046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:47:04,850] [INFO] [timer.py:197:stop] 0/1332, RunningAvgSamplesPerSec=6.324392944136967, CurrSamplesPerSec=5.685559434291239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:47:16,202] [INFO] [timer.py:197:stop] 0/1334, RunningAvgSamplesPerSec=6.3243919800862995, CurrSamplesPerSec=5.687262473216299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:47:27,578] [INFO] [timer.py:197:stop] 0/1336, RunningAvgSamplesPerSec=6.3243769326191455, CurrSamplesPerSec=5.686703434412646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:47:39,062] [INFO] [timer.py:197:stop] 0/1338, RunningAvgSamplesPerSec=6.324378950957167, CurrSamplesPerSec=5.691266183990615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:47:50,436] [INFO] [logging.py:68:log_dist] [Rank 0] step=670, skipped=5, lr=[9.635555555555557e-06], mom=[[0.9, 0.999]] [2022-12-16 21:47:50,438] [INFO] [timer.py:197:stop] 0/1340, RunningAvgSamplesPerSec=6.324378271402667, CurrSamplesPerSec=5.691395056120047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:48:01,811] [INFO] [timer.py:197:stop] 0/1342, RunningAvgSamplesPerSec=6.324364127126357, CurrSamplesPerSec=5.663703573734233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:48:13,192] [INFO] [timer.py:197:stop] 0/1344, RunningAvgSamplesPerSec=6.324348771990201, CurrSamplesPerSec=5.698139777624442, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:48:24,528] [INFO] [timer.py:197:stop] 0/1346, RunningAvgSamplesPerSec=6.324374215372536, CurrSamplesPerSec=5.702518106714946, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:48:35,895] [INFO] [timer.py:197:stop] 0/1348, RunningAvgSamplesPerSec=6.3243590936320375, CurrSamplesPerSec=5.687893692645465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:48:47,295] [INFO] [timer.py:197:stop] 0/1350, RunningAvgSamplesPerSec=6.324326854499699, CurrSamplesPerSec=5.655780951123609, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0374, 'learning_rate': 9.624444444444445e-06, 'epoch': 2.86} [2022-12-16 21:48:58,904] [INFO] [timer.py:197:stop] 0/1352, RunningAvgSamplesPerSec=6.324318267548198, CurrSamplesPerSec=5.6796292936049895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:49:10,273] [INFO] [timer.py:197:stop] 0/1354, RunningAvgSamplesPerSec=6.324296695879865, CurrSamplesPerSec=5.667203924328584, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:49:21,654] [INFO] [timer.py:197:stop] 0/1356, RunningAvgSamplesPerSec=6.32428018514822, CurrSamplesPerSec=5.6751492458016735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:49:33,006] [INFO] [timer.py:197:stop] 0/1358, RunningAvgSamplesPerSec=6.324275196063561, CurrSamplesPerSec=5.69712223815206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:49:44,386] [INFO] [logging.py:68:log_dist] [Rank 0] step=680, skipped=5, lr=[9.613333333333335e-06], mom=[[0.9, 0.999]] [2022-12-16 21:49:44,388] [INFO] [timer.py:197:stop] 0/1360, RunningAvgSamplesPerSec=6.324255960679552, CurrSamplesPerSec=5.670191860369816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:49:55,738] [INFO] [timer.py:197:stop] 0/1362, RunningAvgSamplesPerSec=6.324254309513514, CurrSamplesPerSec=5.6942959812107965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:50:07,114] [INFO] [timer.py:197:stop] 0/1364, RunningAvgSamplesPerSec=6.324256979747847, CurrSamplesPerSec=5.6949874818050725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:50:18,496] [INFO] [timer.py:197:stop] 0/1366, RunningAvgSamplesPerSec=6.324236172921695, CurrSamplesPerSec=5.687761122532687, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:50:29,947] [INFO] [timer.py:197:stop] 0/1368, RunningAvgSamplesPerSec=6.3242387362212655, CurrSamplesPerSec=5.692255077058373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:50:41,282] [INFO] [timer.py:197:stop] 0/1370, RunningAvgSamplesPerSec=6.324249802937796, CurrSamplesPerSec=5.703811950695486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:50:52,601] [INFO] [timer.py:197:stop] 0/1372, RunningAvgSamplesPerSec=6.32427068603822, CurrSamplesPerSec=5.704097261816972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:51:03,926] [INFO] [timer.py:197:stop] 0/1374, RunningAvgSamplesPerSec=6.3242886615535285, CurrSamplesPerSec=5.698351457952698, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:51:15,332] [INFO] [timer.py:197:stop] 0/1376, RunningAvgSamplesPerSec=6.324278565827469, CurrSamplesPerSec=5.688882378479673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:51:26,682] [INFO] [timer.py:197:stop] 0/1378, RunningAvgSamplesPerSec=6.324272743331345, CurrSamplesPerSec=5.677133235207755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:51:38,045] [INFO] [logging.py:68:log_dist] [Rank 0] step=690, skipped=5, lr=[9.591111111111113e-06], mom=[[0.9, 0.999]] [2022-12-16 21:51:38,047] [INFO] [timer.py:197:stop] 0/1380, RunningAvgSamplesPerSec=6.324266336132586, CurrSamplesPerSec=5.672529097773243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:51:49,338] [INFO] [timer.py:197:stop] 0/1382, RunningAvgSamplesPerSec=6.3242997387905024, CurrSamplesPerSec=5.70745550803231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:00,686] [INFO] [timer.py:197:stop] 0/1384, RunningAvgSamplesPerSec=6.324323307663187, CurrSamplesPerSec=5.692134373626455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:12,025] [INFO] [timer.py:197:stop] 0/1386, RunningAvgSamplesPerSec=6.324327041155843, CurrSamplesPerSec=5.6880930415204345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:23,323] [INFO] [timer.py:197:stop] 0/1388, RunningAvgSamplesPerSec=6.324380802604177, CurrSamplesPerSec=5.735462831952154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:34,646] [INFO] [timer.py:197:stop] 0/1390, RunningAvgSamplesPerSec=6.324399578264787, CurrSamplesPerSec=5.719052804513825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:45,967] [INFO] [timer.py:197:stop] 0/1392, RunningAvgSamplesPerSec=6.3244345536156, CurrSamplesPerSec=5.704161503176972, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:52:57,335] [INFO] [timer.py:197:stop] 0/1394, RunningAvgSamplesPerSec=6.32447811176164, CurrSamplesPerSec=5.718143983956547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:53:08,631] [INFO] [timer.py:197:stop] 0/1396, RunningAvgSamplesPerSec=6.324519518597146, CurrSamplesPerSec=5.714886786951273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:53:19,938] [INFO] [timer.py:197:stop] 0/1398, RunningAvgSamplesPerSec=6.32455268106817, CurrSamplesPerSec=5.7012866044767625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:53:31,265] [INFO] [logging.py:68:log_dist] [Rank 0] step=700, skipped=5, lr=[9.56888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 21:53:31,267] [INFO] [timer.py:197:stop] 0/1400, RunningAvgSamplesPerSec=6.324579534160159, CurrSamplesPerSec=5.692990030015316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0371, 'learning_rate': 9.56888888888889e-06, 'epoch': 2.97} [2022-12-16 21:53:42,558] [INFO] [timer.py:197:stop] 0/1402, RunningAvgSamplesPerSec=6.32462746208894, CurrSamplesPerSec=5.7200996461464815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:53:53,895] [INFO] [timer.py:197:stop] 0/1404, RunningAvgSamplesPerSec=6.324632008464894, CurrSamplesPerSec=5.690293315377784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:05,179] [INFO] [timer.py:197:stop] 0/1406, RunningAvgSamplesPerSec=6.324698640337143, CurrSamplesPerSec=5.726335492400019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:16,499] [INFO] [timer.py:197:stop] 0/1408, RunningAvgSamplesPerSec=6.324731919832654, CurrSamplesPerSec=5.702019773539062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:27,822] [INFO] [timer.py:197:stop] 0/1410, RunningAvgSamplesPerSec=6.3247630088837274, CurrSamplesPerSec=5.724249100746272, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:39,170] [INFO] [timer.py:197:stop] 0/1412, RunningAvgSamplesPerSec=6.3247716831156175, CurrSamplesPerSec=5.682469154661037, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:50,513] [INFO] [timer.py:197:stop] 0/1414, RunningAvgSamplesPerSec=6.32478804755692, CurrSamplesPerSec=5.704216776146671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:54:59,057] [INFO] [timer.py:197:stop] 0/1416, RunningAvgSamplesPerSec=6.32697491966737, CurrSamplesPerSec=10.183929802384258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:55:10,405] [INFO] [timer.py:197:stop] 0/1418, RunningAvgSamplesPerSec=6.326984384492439, CurrSamplesPerSec=5.698068656563964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:55:21,716] [INFO] [logging.py:68:log_dist] [Rank 0] step=710, skipped=5, lr=[9.546666666666668e-06], mom=[[0.9, 0.999]] [2022-12-16 21:55:21,718] [INFO] [timer.py:197:stop] 0/1420, RunningAvgSamplesPerSec=6.327008366408901, CurrSamplesPerSec=5.722761494483829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:55:33,025] [INFO] [timer.py:197:stop] 0/1422, RunningAvgSamplesPerSec=6.32703881467068, CurrSamplesPerSec=5.725638068365156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:55:44,329] [INFO] [timer.py:197:stop] 0/1424, RunningAvgSamplesPerSec=6.3270593471696595, CurrSamplesPerSec=5.7123326634216784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:55:55,642] [INFO] [timer.py:197:stop] 0/1426, RunningAvgSamplesPerSec=6.327096793945078, CurrSamplesPerSec=5.710449849131292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:56:06,990] [INFO] [timer.py:197:stop] 0/1428, RunningAvgSamplesPerSec=6.327105470684101, CurrSamplesPerSec=5.6794182809613725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:56:18,294] [INFO] [timer.py:197:stop] 0/1430, RunningAvgSamplesPerSec=6.3271388039928596, CurrSamplesPerSec=5.714867320181734, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:56:29,628] [INFO] [timer.py:197:stop] 0/1432, RunningAvgSamplesPerSec=6.3271327648059374, CurrSamplesPerSec=5.6914160526940245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:56:40,960] [INFO] [timer.py:197:stop] 0/1434, RunningAvgSamplesPerSec=6.327141902219056, CurrSamplesPerSec=5.689690747652517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:56:52,302] [INFO] [timer.py:197:stop] 0/1436, RunningAvgSamplesPerSec=6.327156027945619, CurrSamplesPerSec=5.689736092584112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:57:03,628] [INFO] [timer.py:197:stop] 0/1438, RunningAvgSamplesPerSec=6.3271860924201, CurrSamplesPerSec=5.701005206768365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:57:14,933] [INFO] [logging.py:68:log_dist] [Rank 0] step=720, skipped=5, lr=[9.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-16 21:57:14,935] [INFO] [timer.py:197:stop] 0/1440, RunningAvgSamplesPerSec=6.327240541450005, CurrSamplesPerSec=5.733791309617626, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:57:26,263] [INFO] [timer.py:197:stop] 0/1442, RunningAvgSamplesPerSec=6.327265439071706, CurrSamplesPerSec=5.709909562214138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:57:37,604] [INFO] [timer.py:197:stop] 0/1444, RunningAvgSamplesPerSec=6.327289199225549, CurrSamplesPerSec=5.7069866443111605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:57:49,080] [INFO] [timer.py:197:stop] 0/1446, RunningAvgSamplesPerSec=6.327306402988852, CurrSamplesPerSec=5.691615407194863, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:58:00,419] [INFO] [timer.py:197:stop] 0/1448, RunningAvgSamplesPerSec=6.327318797851507, CurrSamplesPerSec=5.702858777371699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:58:11,758] [INFO] [timer.py:197:stop] 0/1450, RunningAvgSamplesPerSec=6.3273351798091335, CurrSamplesPerSec=5.695963648425355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0227, 'learning_rate': 9.513333333333334e-06, 'epoch': 3.07} [2022-12-16 21:58:23,085] [INFO] [timer.py:197:stop] 0/1452, RunningAvgSamplesPerSec=6.32736439156718, CurrSamplesPerSec=5.69986222339413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:58:34,423] [INFO] [timer.py:197:stop] 0/1454, RunningAvgSamplesPerSec=6.327382444473141, CurrSamplesPerSec=5.691851947598792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:58:45,728] [INFO] [timer.py:197:stop] 0/1456, RunningAvgSamplesPerSec=6.327411355372915, CurrSamplesPerSec=5.7264754862424025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:58:57,189] [INFO] [timer.py:197:stop] 0/1458, RunningAvgSamplesPerSec=6.327422689770682, CurrSamplesPerSec=5.687144391209955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:59:08,503] [INFO] [logging.py:68:log_dist] [Rank 0] step=730, skipped=5, lr=[9.502222222222223e-06], mom=[[0.9, 0.999]] [2022-12-16 21:59:08,505] [INFO] [timer.py:197:stop] 0/1460, RunningAvgSamplesPerSec=6.327457442468501, CurrSamplesPerSec=5.7197886003305705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:59:19,811] [INFO] [timer.py:197:stop] 0/1462, RunningAvgSamplesPerSec=6.327485371631291, CurrSamplesPerSec=5.720289800531883, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:59:31,139] [INFO] [timer.py:197:stop] 0/1464, RunningAvgSamplesPerSec=6.327510730275065, CurrSamplesPerSec=5.717306565446505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:59:42,470] [INFO] [timer.py:197:stop] 0/1466, RunningAvgSamplesPerSec=6.327533540895815, CurrSamplesPerSec=5.704577773769327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 21:59:53,873] [INFO] [timer.py:197:stop] 0/1468, RunningAvgSamplesPerSec=6.327495467322438, CurrSamplesPerSec=5.645587947967342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:00:05,302] [INFO] [timer.py:197:stop] 0/1470, RunningAvgSamplesPerSec=6.327436755212707, CurrSamplesPerSec=5.624859129929112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:00:16,701] [INFO] [timer.py:197:stop] 0/1472, RunningAvgSamplesPerSec=6.3274364325893995, CurrSamplesPerSec=5.687119088560816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:00:28,151] [INFO] [timer.py:197:stop] 0/1474, RunningAvgSamplesPerSec=6.327424697035128, CurrSamplesPerSec=5.67494696401265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:00:39,471] [INFO] [timer.py:197:stop] 0/1476, RunningAvgSamplesPerSec=6.327443763191702, CurrSamplesPerSec=5.70665906786341, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:00:50,872] [INFO] [timer.py:197:stop] 0/1478, RunningAvgSamplesPerSec=6.327434800015111, CurrSamplesPerSec=5.677509785642671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:02,243] [INFO] [logging.py:68:log_dist] [Rank 0] step=740, skipped=5, lr=[9.48e-06], mom=[[0.9, 0.999]] [2022-12-16 22:01:02,245] [INFO] [timer.py:197:stop] 0/1480, RunningAvgSamplesPerSec=6.327453438484531, CurrSamplesPerSec=5.707022801349469, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:13,631] [INFO] [timer.py:197:stop] 0/1482, RunningAvgSamplesPerSec=6.327449584064208, CurrSamplesPerSec=5.686699097472274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:25,011] [INFO] [timer.py:197:stop] 0/1484, RunningAvgSamplesPerSec=6.327461812433459, CurrSamplesPerSec=5.6979624620946625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:36,382] [INFO] [timer.py:197:stop] 0/1486, RunningAvgSamplesPerSec=6.327474119248579, CurrSamplesPerSec=5.7187698940143425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:47,707] [INFO] [timer.py:197:stop] 0/1488, RunningAvgSamplesPerSec=6.327500660405285, CurrSamplesPerSec=5.706598409685004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:01:59,104] [INFO] [timer.py:197:stop] 0/1490, RunningAvgSamplesPerSec=6.327497573482637, CurrSamplesPerSec=5.683719497130948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:02:10,486] [INFO] [timer.py:197:stop] 0/1492, RunningAvgSamplesPerSec=6.32748801533402, CurrSamplesPerSec=5.683161635883977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:02:21,823] [INFO] [timer.py:197:stop] 0/1494, RunningAvgSamplesPerSec=6.32750880749429, CurrSamplesPerSec=5.711333623231045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:02:33,231] [INFO] [timer.py:197:stop] 0/1496, RunningAvgSamplesPerSec=6.327513699728768, CurrSamplesPerSec=5.706028528414905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:02:44,547] [INFO] [timer.py:197:stop] 0/1498, RunningAvgSamplesPerSec=6.327550204303378, CurrSamplesPerSec=5.724850952803729, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:02:56,026] [INFO] [logging.py:68:log_dist] [Rank 0] step=750, skipped=5, lr=[9.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-16 22:02:56,027] [INFO] [timer.py:197:stop] 0/1500, RunningAvgSamplesPerSec=6.327458217506727, CurrSamplesPerSec=5.5676383899106145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0173, 'learning_rate': 9.457777777777778e-06, 'epoch': 3.18} [2022-12-16 22:03:07,403] [INFO] [timer.py:197:stop] 0/1502, RunningAvgSamplesPerSec=6.327455446968343, CurrSamplesPerSec=5.696816345954642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:03:18,794] [INFO] [timer.py:197:stop] 0/1504, RunningAvgSamplesPerSec=6.327460698300773, CurrSamplesPerSec=5.6907456868615665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:03:30,167] [INFO] [timer.py:197:stop] 0/1506, RunningAvgSamplesPerSec=6.327476845175162, CurrSamplesPerSec=5.700586309832747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:03:41,557] [INFO] [timer.py:197:stop] 0/1508, RunningAvgSamplesPerSec=6.327479752759413, CurrSamplesPerSec=5.698833422003168, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:03:53,085] [INFO] [timer.py:197:stop] 0/1510, RunningAvgSamplesPerSec=6.327483658107985, CurrSamplesPerSec=5.690109492159518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:04:04,525] [INFO] [timer.py:197:stop] 0/1512, RunningAvgSamplesPerSec=6.327482981825684, CurrSamplesPerSec=5.691443324337489, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:04:15,856] [INFO] [timer.py:197:stop] 0/1514, RunningAvgSamplesPerSec=6.327504069039711, CurrSamplesPerSec=5.702331554684467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:04:27,180] [INFO] [timer.py:197:stop] 0/1516, RunningAvgSamplesPerSec=6.32752931475808, CurrSamplesPerSec=5.699813812273017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:04:38,838] [INFO] [timer.py:197:stop] 0/1518, RunningAvgSamplesPerSec=6.327495958559439, CurrSamplesPerSec=5.70111127252215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:04:50,193] [INFO] [logging.py:68:log_dist] [Rank 0] step=760, skipped=5, lr=[9.435555555555556e-06], mom=[[0.9, 0.999]] [2022-12-16 22:04:50,195] [INFO] [timer.py:197:stop] 0/1520, RunningAvgSamplesPerSec=6.327491850184652, CurrSamplesPerSec=5.6779257782447745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:01,599] [INFO] [timer.py:197:stop] 0/1522, RunningAvgSamplesPerSec=6.3275112010285595, CurrSamplesPerSec=5.704700217859312, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:13,081] [INFO] [timer.py:197:stop] 0/1524, RunningAvgSamplesPerSec=6.327502103613446, CurrSamplesPerSec=5.698904562156051, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:24,417] [INFO] [timer.py:197:stop] 0/1526, RunningAvgSamplesPerSec=6.327493535489943, CurrSamplesPerSec=5.677983426370301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:35,707] [INFO] [timer.py:197:stop] 0/1528, RunningAvgSamplesPerSec=6.327507103762715, CurrSamplesPerSec=5.697476050095679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:47,074] [INFO] [timer.py:197:stop] 0/1530, RunningAvgSamplesPerSec=6.327507935110874, CurrSamplesPerSec=5.715472068638608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:05:58,447] [INFO] [timer.py:197:stop] 0/1532, RunningAvgSamplesPerSec=6.32751138193243, CurrSamplesPerSec=5.6919378793894335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:06:09,787] [INFO] [timer.py:197:stop] 0/1534, RunningAvgSamplesPerSec=6.327521338018502, CurrSamplesPerSec=5.7004705792212755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:06:21,190] [INFO] [timer.py:197:stop] 0/1536, RunningAvgSamplesPerSec=6.3274682129513, CurrSamplesPerSec=5.712174397994211, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:06:32,541] [INFO] [timer.py:197:stop] 0/1538, RunningAvgSamplesPerSec=6.3274698474125985, CurrSamplesPerSec=5.68027300306161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:06:43,888] [INFO] [logging.py:68:log_dist] [Rank 0] step=770, skipped=5, lr=[9.413333333333334e-06], mom=[[0.9, 0.999]] [2022-12-16 22:06:43,890] [INFO] [timer.py:197:stop] 0/1540, RunningAvgSamplesPerSec=6.327473245108575, CurrSamplesPerSec=5.6981792094536186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:06:55,564] [INFO] [timer.py:197:stop] 0/1542, RunningAvgSamplesPerSec=6.327513329572601, CurrSamplesPerSec=5.718835197312315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:07:07,225] [INFO] [timer.py:197:stop] 0/1544, RunningAvgSamplesPerSec=6.327515501200996, CurrSamplesPerSec=5.701398008835235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:07:18,958] [INFO] [timer.py:197:stop] 0/1546, RunningAvgSamplesPerSec=6.3272162957930815, CurrSamplesPerSec=5.309439632999244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:07:30,285] [INFO] [timer.py:197:stop] 0/1548, RunningAvgSamplesPerSec=6.327237769724863, CurrSamplesPerSec=5.715100200108538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:07:41,816] [INFO] [timer.py:197:stop] 0/1550, RunningAvgSamplesPerSec=6.327274161216004, CurrSamplesPerSec=5.71891780324732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0172, 'learning_rate': 9.402222222222222e-06, 'epoch': 3.28} [2022-12-16 22:07:53,410] [INFO] [timer.py:197:stop] 0/1552, RunningAvgSamplesPerSec=6.32706992914816, CurrSamplesPerSec=5.411217698198677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:08:04,941] [INFO] [timer.py:197:stop] 0/1554, RunningAvgSamplesPerSec=6.327092359809133, CurrSamplesPerSec=5.700676379585338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:08:16,285] [INFO] [timer.py:197:stop] 0/1556, RunningAvgSamplesPerSec=6.327101079548158, CurrSamplesPerSec=5.68027853219152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:08:27,892] [INFO] [timer.py:197:stop] 0/1558, RunningAvgSamplesPerSec=6.326898704258696, CurrSamplesPerSec=5.425364903600726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:08:39,298] [INFO] [logging.py:68:log_dist] [Rank 0] step=780, skipped=5, lr=[9.391111111111111e-06], mom=[[0.9, 0.999]] [2022-12-16 22:08:39,300] [INFO] [timer.py:197:stop] 0/1560, RunningAvgSamplesPerSec=6.326863672241038, CurrSamplesPerSec=5.64888880789383, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:08:50,744] [INFO] [timer.py:197:stop] 0/1562, RunningAvgSamplesPerSec=6.326881633242561, CurrSamplesPerSec=5.688473217102632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:02,169] [INFO] [timer.py:197:stop] 0/1564, RunningAvgSamplesPerSec=6.326828353757074, CurrSamplesPerSec=5.6164079889420755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:13,635] [INFO] [timer.py:197:stop] 0/1566, RunningAvgSamplesPerSec=6.32684329058874, CurrSamplesPerSec=5.711484793733482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:25,042] [INFO] [timer.py:197:stop] 0/1568, RunningAvgSamplesPerSec=6.326860661533723, CurrSamplesPerSec=5.69864977247261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:36,536] [INFO] [timer.py:197:stop] 0/1570, RunningAvgSamplesPerSec=6.326751882842448, CurrSamplesPerSec=5.55672099720942, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:48,053] [INFO] [timer.py:197:stop] 0/1572, RunningAvgSamplesPerSec=6.326774718534809, CurrSamplesPerSec=5.70892423918322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:09:59,564] [INFO] [timer.py:197:stop] 0/1574, RunningAvgSamplesPerSec=6.32673483292785, CurrSamplesPerSec=5.659623979889483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:10:10,906] [INFO] [timer.py:197:stop] 0/1576, RunningAvgSamplesPerSec=6.326735484116569, CurrSamplesPerSec=5.687612892361142, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:10:22,483] [INFO] [timer.py:197:stop] 0/1578, RunningAvgSamplesPerSec=6.326743472057318, CurrSamplesPerSec=5.687099328553298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:10:34,085] [INFO] [logging.py:68:log_dist] [Rank 0] step=790, skipped=5, lr=[9.368888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 22:10:34,086] [INFO] [timer.py:197:stop] 0/1580, RunningAvgSamplesPerSec=6.326741989967983, CurrSamplesPerSec=5.6911527620388345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:10:45,432] [INFO] [timer.py:197:stop] 0/1582, RunningAvgSamplesPerSec=6.326739758130946, CurrSamplesPerSec=5.690173660070088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:10:56,913] [INFO] [timer.py:197:stop] 0/1584, RunningAvgSamplesPerSec=6.326729646225332, CurrSamplesPerSec=5.6936201062598, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:11:08,620] [INFO] [timer.py:197:stop] 0/1586, RunningAvgSamplesPerSec=6.326719942273098, CurrSamplesPerSec=5.678790863330024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:11:19,982] [INFO] [timer.py:197:stop] 0/1588, RunningAvgSamplesPerSec=6.326714577519651, CurrSamplesPerSec=5.6897175203704835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:11:31,391] [INFO] [timer.py:197:stop] 0/1590, RunningAvgSamplesPerSec=6.326709002994419, CurrSamplesPerSec=5.695294385974962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:11:42,904] [INFO] [timer.py:197:stop] 0/1592, RunningAvgSamplesPerSec=6.326720212022566, CurrSamplesPerSec=5.700918516686109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:11:54,288] [INFO] [timer.py:197:stop] 0/1594, RunningAvgSamplesPerSec=6.326701126454511, CurrSamplesPerSec=5.660456759611574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:12:05,611] [INFO] [timer.py:197:stop] 0/1596, RunningAvgSamplesPerSec=6.326729466778808, CurrSamplesPerSec=5.713571618248627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:12:16,916] [INFO] [timer.py:197:stop] 0/1598, RunningAvgSamplesPerSec=6.3267459876247045, CurrSamplesPerSec=5.7029544923126725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:12:28,446] [INFO] [logging.py:68:log_dist] [Rank 0] step=800, skipped=5, lr=[9.346666666666666e-06], mom=[[0.9, 0.999]] [2022-12-16 22:12:28,448] [INFO] [timer.py:197:stop] 0/1600, RunningAvgSamplesPerSec=6.326609933797941, CurrSamplesPerSec=5.5002245491774175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0166, 'learning_rate': 9.346666666666666e-06, 'epoch': 3.39} [2022-12-16 22:12:39,770] [INFO] [timer.py:197:stop] 0/1602, RunningAvgSamplesPerSec=6.326638422676115, CurrSamplesPerSec=5.707492641836485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:12:51,065] [INFO] [timer.py:197:stop] 0/1604, RunningAvgSamplesPerSec=6.3266746590610845, CurrSamplesPerSec=5.729000214488998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:02,571] [INFO] [timer.py:197:stop] 0/1606, RunningAvgSamplesPerSec=6.326559939629545, CurrSamplesPerSec=5.546773415140327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:13,872] [INFO] [timer.py:197:stop] 0/1608, RunningAvgSamplesPerSec=6.326604435719887, CurrSamplesPerSec=5.730451430762789, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:25,198] [INFO] [timer.py:197:stop] 0/1610, RunningAvgSamplesPerSec=6.326630818586847, CurrSamplesPerSec=5.708273048903529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:36,873] [INFO] [timer.py:197:stop] 0/1612, RunningAvgSamplesPerSec=6.32665586766633, CurrSamplesPerSec=5.703729295796363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:48,186] [INFO] [timer.py:197:stop] 0/1614, RunningAvgSamplesPerSec=6.3266910772859575, CurrSamplesPerSec=5.71860006368878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:13:59,799] [INFO] [timer.py:197:stop] 0/1616, RunningAvgSamplesPerSec=6.326687842512467, CurrSamplesPerSec=5.6908894957427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:14:11,397] [INFO] [timer.py:197:stop] 0/1618, RunningAvgSamplesPerSec=6.326692582176606, CurrSamplesPerSec=5.701328259509893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:14:22,723] [INFO] [logging.py:68:log_dist] [Rank 0] step=810, skipped=5, lr=[9.324444444444444e-06], mom=[[0.9, 0.999]] [2022-12-16 22:14:22,725] [INFO] [timer.py:197:stop] 0/1620, RunningAvgSamplesPerSec=6.326714089116822, CurrSamplesPerSec=5.699270695864197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:14:34,324] [INFO] [timer.py:197:stop] 0/1622, RunningAvgSamplesPerSec=6.326731860344668, CurrSamplesPerSec=5.691813568788928, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:14:46,047] [INFO] [timer.py:197:stop] 0/1624, RunningAvgSamplesPerSec=6.326712534725961, CurrSamplesPerSec=5.690206468282752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:14:57,368] [INFO] [timer.py:197:stop] 0/1626, RunningAvgSamplesPerSec=6.326740345466834, CurrSamplesPerSec=5.712110948419259, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:15:08,686] [INFO] [timer.py:197:stop] 0/1628, RunningAvgSamplesPerSec=6.326758798477797, CurrSamplesPerSec=5.6973323918270635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:15:20,061] [INFO] [timer.py:197:stop] 0/1630, RunningAvgSamplesPerSec=6.326747846596995, CurrSamplesPerSec=5.691386609266169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:15:31,412] [INFO] [timer.py:197:stop] 0/1632, RunningAvgSamplesPerSec=6.3267535500364955, CurrSamplesPerSec=5.690381612642104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:15:42,701] [INFO] [timer.py:197:stop] 0/1634, RunningAvgSamplesPerSec=6.326795979692171, CurrSamplesPerSec=5.725662737910497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:15:54,393] [INFO] [timer.py:197:stop] 0/1636, RunningAvgSamplesPerSec=6.326834150424974, CurrSamplesPerSec=5.719259705105001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:16:05,823] [INFO] [timer.py:197:stop] 0/1638, RunningAvgSamplesPerSec=6.326843525216495, CurrSamplesPerSec=5.697779594336017, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:16:17,363] [INFO] [logging.py:68:log_dist] [Rank 0] step=820, skipped=5, lr=[9.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-16 22:16:17,364] [INFO] [timer.py:197:stop] 0/1640, RunningAvgSamplesPerSec=6.326679026146229, CurrSamplesPerSec=5.458330096859955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:16:28,822] [INFO] [timer.py:197:stop] 0/1642, RunningAvgSamplesPerSec=6.326724094955409, CurrSamplesPerSec=5.730091554512378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:16:40,240] [INFO] [timer.py:197:stop] 0/1644, RunningAvgSamplesPerSec=6.326761135677585, CurrSamplesPerSec=5.722593622930806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:16:51,697] [INFO] [timer.py:197:stop] 0/1646, RunningAvgSamplesPerSec=6.326697672667059, CurrSamplesPerSec=5.612376038068795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:17:03,218] [INFO] [timer.py:197:stop] 0/1648, RunningAvgSamplesPerSec=6.326712992902165, CurrSamplesPerSec=5.70818492382536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:17:14,675] [INFO] [timer.py:197:stop] 0/1650, RunningAvgSamplesPerSec=6.32673517454783, CurrSamplesPerSec=5.714327900088595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0196, 'learning_rate': 9.291111111111112e-06, 'epoch': 3.5} [2022-12-16 22:17:26,082] [INFO] [timer.py:197:stop] 0/1652, RunningAvgSamplesPerSec=6.326699396447215, CurrSamplesPerSec=5.621307520169431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:17:37,569] [INFO] [timer.py:197:stop] 0/1654, RunningAvgSamplesPerSec=6.3267304307190875, CurrSamplesPerSec=5.707525164650132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:17:49,057] [INFO] [timer.py:197:stop] 0/1656, RunningAvgSamplesPerSec=6.326770394708087, CurrSamplesPerSec=5.701385415080797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:00,478] [INFO] [timer.py:197:stop] 0/1658, RunningAvgSamplesPerSec=6.326760667852926, CurrSamplesPerSec=5.662375782745366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:11,888] [INFO] [logging.py:68:log_dist] [Rank 0] step=830, skipped=5, lr=[9.280000000000001e-06], mom=[[0.9, 0.999]] [2022-12-16 22:18:11,890] [INFO] [timer.py:197:stop] 0/1660, RunningAvgSamplesPerSec=6.326821622093543, CurrSamplesPerSec=5.7435494700991665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:23,408] [INFO] [timer.py:197:stop] 0/1662, RunningAvgSamplesPerSec=6.326843356419004, CurrSamplesPerSec=5.699876262773075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:34,855] [INFO] [timer.py:197:stop] 0/1664, RunningAvgSamplesPerSec=6.326798218964038, CurrSamplesPerSec=5.607935810568727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:46,396] [INFO] [timer.py:197:stop] 0/1666, RunningAvgSamplesPerSec=6.326834587271337, CurrSamplesPerSec=5.716924474281732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:18:57,827] [INFO] [timer.py:197:stop] 0/1668, RunningAvgSamplesPerSec=6.326838624746808, CurrSamplesPerSec=5.694305403046297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:19:09,201] [INFO] [timer.py:197:stop] 0/1670, RunningAvgSamplesPerSec=6.326862994715842, CurrSamplesPerSec=5.696425141989684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:19:20,826] [INFO] [timer.py:197:stop] 0/1672, RunningAvgSamplesPerSec=6.326887429548338, CurrSamplesPerSec=5.722593622930806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:19:32,408] [INFO] [timer.py:197:stop] 0/1674, RunningAvgSamplesPerSec=6.326924458197643, CurrSamplesPerSec=5.73617883162889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:19:43,753] [INFO] [timer.py:197:stop] 0/1676, RunningAvgSamplesPerSec=6.326932090614424, CurrSamplesPerSec=5.685106201292763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:19:55,356] [INFO] [timer.py:197:stop] 0/1678, RunningAvgSamplesPerSec=6.326963550525718, CurrSamplesPerSec=5.708342482699776, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:20:06,777] [INFO] [logging.py:68:log_dist] [Rank 0] step=840, skipped=5, lr=[9.257777777777779e-06], mom=[[0.9, 0.999]] [2022-12-16 22:20:06,779] [INFO] [timer.py:197:stop] 0/1680, RunningAvgSamplesPerSec=6.326964525629489, CurrSamplesPerSec=5.694321106174752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:20:18,115] [INFO] [timer.py:197:stop] 0/1682, RunningAvgSamplesPerSec=6.326980760880848, CurrSamplesPerSec=5.6901266195429265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:20:29,491] [INFO] [timer.py:197:stop] 0/1684, RunningAvgSamplesPerSec=6.327030240339728, CurrSamplesPerSec=5.727282599924787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:20:40,822] [INFO] [timer.py:197:stop] 0/1686, RunningAvgSamplesPerSec=6.327050807508987, CurrSamplesPerSec=5.70611585915527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:20:52,189] [INFO] [timer.py:197:stop] 0/1688, RunningAvgSamplesPerSec=6.327042771339603, CurrSamplesPerSec=5.677036223927891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:21:03,656] [INFO] [timer.py:197:stop] 0/1690, RunningAvgSamplesPerSec=6.327078544945673, CurrSamplesPerSec=5.716653948578718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:21:14,994] [INFO] [timer.py:197:stop] 0/1692, RunningAvgSamplesPerSec=6.327080143901506, CurrSamplesPerSec=5.695177661829884, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:21:26,497] [INFO] [timer.py:197:stop] 0/1694, RunningAvgSamplesPerSec=6.3269586642354865, CurrSamplesPerSec=5.512899898920243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:21:37,801] [INFO] [timer.py:197:stop] 0/1696, RunningAvgSamplesPerSec=6.326997931656717, CurrSamplesPerSec=5.730388308477891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:21:49,125] [INFO] [timer.py:197:stop] 0/1698, RunningAvgSamplesPerSec=6.327020896294555, CurrSamplesPerSec=5.703355560734074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:22:00,877] [INFO] [logging.py:68:log_dist] [Rank 0] step=850, skipped=5, lr=[9.235555555555556e-06], mom=[[0.9, 0.999]] [2022-12-16 22:22:00,879] [INFO] [timer.py:197:stop] 0/1700, RunningAvgSamplesPerSec=6.327048230282517, CurrSamplesPerSec=5.725344248442001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0166, 'learning_rate': 9.235555555555556e-06, 'epoch': 3.6} [2022-12-16 22:22:12,166] [INFO] [timer.py:197:stop] 0/1702, RunningAvgSamplesPerSec=6.327088307183525, CurrSamplesPerSec=5.719386679782303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:22:23,543] [INFO] [timer.py:197:stop] 0/1704, RunningAvgSamplesPerSec=6.327120891773229, CurrSamplesPerSec=5.709233133050226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:22:35,131] [INFO] [timer.py:197:stop] 0/1706, RunningAvgSamplesPerSec=6.327090682954414, CurrSamplesPerSec=5.721107852338529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:22:46,419] [INFO] [timer.py:197:stop] 0/1708, RunningAvgSamplesPerSec=6.327128147702684, CurrSamplesPerSec=5.719666238913165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:22:57,727] [INFO] [timer.py:197:stop] 0/1710, RunningAvgSamplesPerSec=6.327163186849155, CurrSamplesPerSec=5.71158128455646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:23:09,157] [INFO] [timer.py:197:stop] 0/1712, RunningAvgSamplesPerSec=6.3271082387375905, CurrSamplesPerSec=5.706069039842615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:23:20,498] [INFO] [timer.py:197:stop] 0/1714, RunningAvgSamplesPerSec=6.327118251822114, CurrSamplesPerSec=5.709560033560292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:23:31,806] [INFO] [timer.py:197:stop] 0/1716, RunningAvgSamplesPerSec=6.32716470276775, CurrSamplesPerSec=5.730877909610886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:23:43,096] [INFO] [timer.py:197:stop] 0/1718, RunningAvgSamplesPerSec=6.327212694616481, CurrSamplesPerSec=5.729023934851759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:23:54,396] [INFO] [logging.py:68:log_dist] [Rank 0] step=860, skipped=5, lr=[9.213333333333334e-06], mom=[[0.9, 0.999]] [2022-12-16 22:23:54,398] [INFO] [timer.py:197:stop] 0/1720, RunningAvgSamplesPerSec=6.327252695395273, CurrSamplesPerSec=5.7282911397132175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:24:05,684] [INFO] [timer.py:197:stop] 0/1722, RunningAvgSamplesPerSec=6.327292967660206, CurrSamplesPerSec=5.7332882305885375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:24:16,984] [INFO] [timer.py:197:stop] 0/1724, RunningAvgSamplesPerSec=6.327334122464313, CurrSamplesPerSec=5.705825252211353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:24:28,430] [INFO] [timer.py:197:stop] 0/1726, RunningAvgSamplesPerSec=6.327362763895628, CurrSamplesPerSec=5.7203936594189075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:24:39,753] [INFO] [timer.py:197:stop] 0/1728, RunningAvgSamplesPerSec=6.327388228071011, CurrSamplesPerSec=5.7061243497864975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:24:51,089] [INFO] [timer.py:197:stop] 0/1730, RunningAvgSamplesPerSec=6.327403411302091, CurrSamplesPerSec=5.6941735001864995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:02,398] [INFO] [timer.py:197:stop] 0/1732, RunningAvgSamplesPerSec=6.327439575613748, CurrSamplesPerSec=5.738262154434546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:13,746] [INFO] [timer.py:197:stop] 0/1734, RunningAvgSamplesPerSec=6.327447907599327, CurrSamplesPerSec=5.701845849844518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:25,056] [INFO] [timer.py:197:stop] 0/1736, RunningAvgSamplesPerSec=6.3274811248164715, CurrSamplesPerSec=5.718736755597752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:36,352] [INFO] [timer.py:197:stop] 0/1738, RunningAvgSamplesPerSec=6.327525118665114, CurrSamplesPerSec=5.739397030686239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:47,809] [INFO] [logging.py:68:log_dist] [Rank 0] step=870, skipped=5, lr=[9.191111111111111e-06], mom=[[0.9, 0.999]] [2022-12-16 22:25:47,810] [INFO] [timer.py:197:stop] 0/1740, RunningAvgSamplesPerSec=6.327570720702815, CurrSamplesPerSec=5.728890663197119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:25:59,188] [INFO] [timer.py:197:stop] 0/1742, RunningAvgSamplesPerSec=6.327554770194363, CurrSamplesPerSec=5.671244133657371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:26:10,526] [INFO] [timer.py:197:stop] 0/1744, RunningAvgSamplesPerSec=6.327555642826735, CurrSamplesPerSec=5.702799896105481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:26:21,899] [INFO] [timer.py:197:stop] 0/1746, RunningAvgSamplesPerSec=6.3275406381037325, CurrSamplesPerSec=5.694381745222209, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:26:33,330] [INFO] [timer.py:197:stop] 0/1748, RunningAvgSamplesPerSec=6.3275595001379035, CurrSamplesPerSec=5.700081051921007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:26:44,667] [INFO] [timer.py:197:stop] 0/1750, RunningAvgSamplesPerSec=6.3275690092916745, CurrSamplesPerSec=5.699173652540018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0188, 'learning_rate': 9.180000000000002e-06, 'epoch': 3.71} [2022-12-16 22:26:56,176] [INFO] [timer.py:197:stop] 0/1752, RunningAvgSamplesPerSec=6.327599408724447, CurrSamplesPerSec=5.720589685059201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:27:07,474] [INFO] [timer.py:197:stop] 0/1754, RunningAvgSamplesPerSec=6.327626901835506, CurrSamplesPerSec=5.717025532305319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:27:19,040] [INFO] [timer.py:197:stop] 0/1756, RunningAvgSamplesPerSec=6.327463841076069, CurrSamplesPerSec=5.456514256106413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:27:30,379] [INFO] [timer.py:197:stop] 0/1758, RunningAvgSamplesPerSec=6.327479196720069, CurrSamplesPerSec=5.714399670907678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:27:41,726] [INFO] [logging.py:68:log_dist] [Rank 0] step=880, skipped=5, lr=[9.168888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 22:27:41,728] [INFO] [timer.py:197:stop] 0/1760, RunningAvgSamplesPerSec=6.327481260641632, CurrSamplesPerSec=5.699712393640301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:27:53,131] [INFO] [timer.py:197:stop] 0/1762, RunningAvgSamplesPerSec=6.3274335041248, CurrSamplesPerSec=5.595196403806098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:28:04,504] [INFO] [timer.py:197:stop] 0/1764, RunningAvgSamplesPerSec=6.327418345128578, CurrSamplesPerSec=5.674870902112147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:28:16,055] [INFO] [timer.py:197:stop] 0/1766, RunningAvgSamplesPerSec=6.327419979238297, CurrSamplesPerSec=5.698506538894324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:28:27,389] [INFO] [timer.py:197:stop] 0/1768, RunningAvgSamplesPerSec=6.327411032748365, CurrSamplesPerSec=5.686275554833261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:28:38,668] [INFO] [timer.py:197:stop] 0/1770, RunningAvgSamplesPerSec=6.327451486650583, CurrSamplesPerSec=5.7193138087887085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:28:50,097] [INFO] [timer.py:197:stop] 0/1772, RunningAvgSamplesPerSec=6.327448629806222, CurrSamplesPerSec=5.677691594771782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:01,797] [INFO] [timer.py:197:stop] 0/1774, RunningAvgSamplesPerSec=6.327447834994487, CurrSamplesPerSec=5.693589432396962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:13,137] [INFO] [timer.py:197:stop] 0/1776, RunningAvgSamplesPerSec=6.3274420899994555, CurrSamplesPerSec=5.6716673572904135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:24,741] [INFO] [timer.py:197:stop] 0/1778, RunningAvgSamplesPerSec=6.327465683861914, CurrSamplesPerSec=5.705979769914438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:36,093] [INFO] [logging.py:68:log_dist] [Rank 0] step=890, skipped=5, lr=[9.146666666666667e-06], mom=[[0.9, 0.999]] [2022-12-16 22:29:36,094] [INFO] [timer.py:197:stop] 0/1780, RunningAvgSamplesPerSec=6.327472717879045, CurrSamplesPerSec=5.712534458483042, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:47,438] [INFO] [timer.py:197:stop] 0/1782, RunningAvgSamplesPerSec=6.327477872742945, CurrSamplesPerSec=5.687404901098838, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:29:58,859] [INFO] [timer.py:197:stop] 0/1784, RunningAvgSamplesPerSec=6.327497107616166, CurrSamplesPerSec=5.715082435349324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:30:10,346] [INFO] [timer.py:197:stop] 0/1786, RunningAvgSamplesPerSec=6.327481575537458, CurrSamplesPerSec=5.696479056278363, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:30:21,630] [INFO] [timer.py:197:stop] 0/1788, RunningAvgSamplesPerSec=6.327516818382441, CurrSamplesPerSec=5.706524893701314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:30:33,134] [INFO] [timer.py:197:stop] 0/1790, RunningAvgSamplesPerSec=6.327544074013899, CurrSamplesPerSec=5.719948994340342, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:30:44,768] [INFO] [timer.py:197:stop] 0/1792, RunningAvgSamplesPerSec=6.327517602571258, CurrSamplesPerSec=5.67557857248875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:30:56,109] [INFO] [timer.py:197:stop] 0/1794, RunningAvgSamplesPerSec=6.327513822697181, CurrSamplesPerSec=5.687254038624659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:31:07,643] [INFO] [timer.py:197:stop] 0/1796, RunningAvgSamplesPerSec=6.327525078782281, CurrSamplesPerSec=5.715674085571417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:31:19,212] [INFO] [timer.py:197:stop] 0/1798, RunningAvgSamplesPerSec=6.327477564547856, CurrSamplesPerSec=5.673090628417221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:31:30,491] [INFO] [logging.py:68:log_dist] [Rank 0] step=900, skipped=5, lr=[9.124444444444444e-06], mom=[[0.9, 0.999]] [2022-12-16 22:31:30,493] [INFO] [timer.py:197:stop] 0/1800, RunningAvgSamplesPerSec=6.327505898154873, CurrSamplesPerSec=5.719943143944835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0201, 'learning_rate': 9.124444444444444e-06, 'epoch': 3.81} [2022-12-16 22:31:42,020] [INFO] [timer.py:197:stop] 0/1802, RunningAvgSamplesPerSec=6.327532702887273, CurrSamplesPerSec=5.720189114647878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:31:53,397] [INFO] [timer.py:197:stop] 0/1804, RunningAvgSamplesPerSec=6.327515890612733, CurrSamplesPerSec=5.675125729502997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:32:04,723] [INFO] [timer.py:197:stop] 0/1806, RunningAvgSamplesPerSec=6.327534028822243, CurrSamplesPerSec=5.714394561738922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:32:16,328] [INFO] [timer.py:197:stop] 0/1808, RunningAvgSamplesPerSec=6.327547488856713, CurrSamplesPerSec=5.7109933973552405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:32:27,941] [INFO] [timer.py:197:stop] 0/1810, RunningAvgSamplesPerSec=6.327525969813436, CurrSamplesPerSec=5.6556872897253445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:32:39,261] [INFO] [timer.py:197:stop] 0/1812, RunningAvgSamplesPerSec=6.327539814732807, CurrSamplesPerSec=5.705926645929653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:32:50,811] [INFO] [timer.py:197:stop] 0/1814, RunningAvgSamplesPerSec=6.327557862560616, CurrSamplesPerSec=5.702386549923387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:02,582] [INFO] [timer.py:197:stop] 0/1816, RunningAvgSamplesPerSec=6.327521828081135, CurrSamplesPerSec=5.651765591854346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:13,943] [INFO] [timer.py:197:stop] 0/1818, RunningAvgSamplesPerSec=6.327518212922253, CurrSamplesPerSec=5.679664624133516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:25,525] [INFO] [logging.py:68:log_dist] [Rank 0] step=910, skipped=5, lr=[9.102222222222224e-06], mom=[[0.9, 0.999]] [2022-12-16 22:33:25,526] [INFO] [timer.py:197:stop] 0/1820, RunningAvgSamplesPerSec=6.32753376849523, CurrSamplesPerSec=5.699907972659344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:36,981] [INFO] [timer.py:197:stop] 0/1822, RunningAvgSamplesPerSec=6.327490357640737, CurrSamplesPerSec=5.683644643984692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:48,325] [INFO] [timer.py:197:stop] 0/1824, RunningAvgSamplesPerSec=6.3274986839712275, CurrSamplesPerSec=5.688544098830273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:33:59,676] [INFO] [timer.py:197:stop] 0/1826, RunningAvgSamplesPerSec=6.327526659579, CurrSamplesPerSec=5.709614439656007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:34:11,315] [INFO] [timer.py:197:stop] 0/1828, RunningAvgSamplesPerSec=6.327508865546779, CurrSamplesPerSec=5.671136780061463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:34:22,675] [INFO] [timer.py:197:stop] 0/1830, RunningAvgSamplesPerSec=6.327503801542838, CurrSamplesPerSec=5.6768350081309285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:34:34,008] [INFO] [timer.py:197:stop] 0/1832, RunningAvgSamplesPerSec=6.327506325462348, CurrSamplesPerSec=5.701579655712555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:34:45,563] [INFO] [timer.py:197:stop] 0/1834, RunningAvgSamplesPerSec=6.327367375204487, CurrSamplesPerSec=5.699337974721977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:34:56,901] [INFO] [timer.py:197:stop] 0/1836, RunningAvgSamplesPerSec=6.327379195232485, CurrSamplesPerSec=5.704727132017407, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:35:08,597] [INFO] [timer.py:197:stop] 0/1838, RunningAvgSamplesPerSec=6.3271557453667935, CurrSamplesPerSec=5.376965988972667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:35:20,153] [INFO] [logging.py:68:log_dist] [Rank 0] step=920, skipped=5, lr=[9.080000000000001e-06], mom=[[0.9, 0.999]] [2022-12-16 22:35:20,155] [INFO] [timer.py:197:stop] 0/1840, RunningAvgSamplesPerSec=6.327150200517756, CurrSamplesPerSec=5.699684558608608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:35:31,485] [INFO] [timer.py:197:stop] 0/1842, RunningAvgSamplesPerSec=6.327162084321393, CurrSamplesPerSec=5.706075347062752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:35:42,800] [INFO] [timer.py:197:stop] 0/1844, RunningAvgSamplesPerSec=6.327168810082029, CurrSamplesPerSec=5.691259909466644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:35:54,479] [INFO] [timer.py:197:stop] 0/1846, RunningAvgSamplesPerSec=6.32719038402284, CurrSamplesPerSec=5.714115760953486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:36:05,815] [INFO] [timer.py:197:stop] 0/1848, RunningAvgSamplesPerSec=6.327205165796917, CurrSamplesPerSec=5.698616624872424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:36:17,200] [INFO] [timer.py:197:stop] 0/1850, RunningAvgSamplesPerSec=6.327176775421694, CurrSamplesPerSec=5.642583654232158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0204, 'learning_rate': 9.06888888888889e-06, 'epoch': 3.92} [2022-12-16 22:36:28,679] [INFO] [timer.py:197:stop] 0/1852, RunningAvgSamplesPerSec=6.327188291303244, CurrSamplesPerSec=5.703051179780791, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:36:40,131] [INFO] [timer.py:197:stop] 0/1854, RunningAvgSamplesPerSec=6.327226112581387, CurrSamplesPerSec=5.734747256027386, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:36:51,510] [INFO] [timer.py:197:stop] 0/1856, RunningAvgSamplesPerSec=6.327212472477824, CurrSamplesPerSec=5.660549146666839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:02,922] [INFO] [timer.py:197:stop] 0/1858, RunningAvgSamplesPerSec=6.327249935928411, CurrSamplesPerSec=5.742332367647951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:14,245] [INFO] [logging.py:68:log_dist] [Rank 0] step=930, skipped=5, lr=[9.057777777777779e-06], mom=[[0.9, 0.999]] [2022-12-16 22:37:14,247] [INFO] [timer.py:197:stop] 0/1860, RunningAvgSamplesPerSec=6.327266281775782, CurrSamplesPerSec=5.716212055721702, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:25,655] [INFO] [timer.py:197:stop] 0/1862, RunningAvgSamplesPerSec=6.3272304696234105, CurrSamplesPerSec=5.653700635485529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:36,996] [INFO] [timer.py:197:stop] 0/1864, RunningAvgSamplesPerSec=6.327240547607681, CurrSamplesPerSec=5.701383235398178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:48,403] [INFO] [timer.py:197:stop] 0/1866, RunningAvgSamplesPerSec=6.327233090336316, CurrSamplesPerSec=5.6837647469343695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:37:59,744] [INFO] [timer.py:197:stop] 0/1868, RunningAvgSamplesPerSec=6.327240665312632, CurrSamplesPerSec=5.687485878507279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:38:11,229] [INFO] [timer.py:197:stop] 0/1870, RunningAvgSamplesPerSec=6.327244748394163, CurrSamplesPerSec=5.691447427173725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:38:22,637] [INFO] [timer.py:197:stop] 0/1872, RunningAvgSamplesPerSec=6.3272248737369186, CurrSamplesPerSec=5.657825353438808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:38:34,073] [INFO] [timer.py:197:stop] 0/1874, RunningAvgSamplesPerSec=6.327195858796056, CurrSamplesPerSec=5.637339864727594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:38:45,464] [INFO] [timer.py:197:stop] 0/1876, RunningAvgSamplesPerSec=6.327186040553401, CurrSamplesPerSec=5.683998228114204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:38:56,954] [INFO] [timer.py:197:stop] 0/1878, RunningAvgSamplesPerSec=6.327190810218982, CurrSamplesPerSec=5.687580837096232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:39:08,592] [INFO] [logging.py:68:log_dist] [Rank 0] step=940, skipped=5, lr=[9.035555555555556e-06], mom=[[0.9, 0.999]] [2022-12-16 22:39:08,594] [INFO] [timer.py:197:stop] 0/1880, RunningAvgSamplesPerSec=6.32700099351177, CurrSamplesPerSec=5.406531701958022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:39:19,900] [INFO] [timer.py:197:stop] 0/1882, RunningAvgSamplesPerSec=6.327025337615598, CurrSamplesPerSec=5.704069383941154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:39:31,203] [INFO] [timer.py:197:stop] 0/1884, RunningAvgSamplesPerSec=6.327061342294388, CurrSamplesPerSec=5.719577030502328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:39:42,655] [INFO] [timer.py:197:stop] 0/1886, RunningAvgSamplesPerSec=6.326975385899959, CurrSamplesPerSec=5.544955760650507, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:39:51,166] [INFO] [timer.py:197:stop] 0/1888, RunningAvgSamplesPerSec=6.3286334775112, CurrSamplesPerSec=10.195979896061841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:02,504] [INFO] [timer.py:197:stop] 0/1890, RunningAvgSamplesPerSec=6.328647509479657, CurrSamplesPerSec=5.710262048965369, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:14,194] [INFO] [timer.py:197:stop] 0/1892, RunningAvgSamplesPerSec=6.3284261862826945, CurrSamplesPerSec=5.367132041920802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:25,543] [INFO] [timer.py:197:stop] 0/1894, RunningAvgSamplesPerSec=6.32843106698793, CurrSamplesPerSec=5.6931508564600115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:36,849] [INFO] [timer.py:197:stop] 0/1896, RunningAvgSamplesPerSec=6.328464305906655, CurrSamplesPerSec=5.728721698325411, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:48,477] [INFO] [timer.py:197:stop] 0/1898, RunningAvgSamplesPerSec=6.328273681309304, CurrSamplesPerSec=5.381708745107983, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:40:59,811] [INFO] [logging.py:68:log_dist] [Rank 0] step=950, skipped=5, lr=[9.013333333333334e-06], mom=[[0.9, 0.999]] [2022-12-16 22:40:59,813] [INFO] [timer.py:197:stop] 0/1900, RunningAvgSamplesPerSec=6.328286660507801, CurrSamplesPerSec=5.713606642642039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0177, 'learning_rate': 9.013333333333334e-06, 'epoch': 4.03} [2022-12-16 22:41:11,142] [INFO] [timer.py:197:stop] 0/1902, RunningAvgSamplesPerSec=6.328296409917468, CurrSamplesPerSec=5.689266276962426, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:41:22,496] [INFO] [timer.py:197:stop] 0/1904, RunningAvgSamplesPerSec=6.328285659790885, CurrSamplesPerSec=5.687977576563542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:41:33,999] [INFO] [timer.py:197:stop] 0/1906, RunningAvgSamplesPerSec=6.328301415114474, CurrSamplesPerSec=5.708266979596024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:41:45,322] [INFO] [timer.py:197:stop] 0/1908, RunningAvgSamplesPerSec=6.328300809211748, CurrSamplesPerSec=5.697474840820759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:41:56,767] [INFO] [timer.py:197:stop] 0/1910, RunningAvgSamplesPerSec=6.328245084111687, CurrSamplesPerSec=5.605507272331195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:42:08,230] [INFO] [timer.py:197:stop] 0/1912, RunningAvgSamplesPerSec=6.32823748333477, CurrSamplesPerSec=5.698080509950753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:42:19,577] [INFO] [timer.py:197:stop] 0/1914, RunningAvgSamplesPerSec=6.328233811772137, CurrSamplesPerSec=5.692649088685696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:42:31,224] [INFO] [timer.py:197:stop] 0/1916, RunningAvgSamplesPerSec=6.328043612491889, CurrSamplesPerSec=5.422718309822167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:42:42,605] [INFO] [timer.py:197:stop] 0/1918, RunningAvgSamplesPerSec=6.3280308734851065, CurrSamplesPerSec=5.674925368944043, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:42:53,953] [INFO] [logging.py:68:log_dist] [Rank 0] step=960, skipped=5, lr=[8.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-16 22:42:53,955] [INFO] [timer.py:197:stop] 0/1920, RunningAvgSamplesPerSec=6.328023141891153, CurrSamplesPerSec=5.678259192763114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:43:05,610] [INFO] [timer.py:197:stop] 0/1922, RunningAvgSamplesPerSec=6.327827961176475, CurrSamplesPerSec=5.4101643964847765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:43:16,922] [INFO] [timer.py:197:stop] 0/1924, RunningAvgSamplesPerSec=6.32784426629273, CurrSamplesPerSec=5.692889095536372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:43:28,308] [INFO] [timer.py:197:stop] 0/1926, RunningAvgSamplesPerSec=6.327825843494196, CurrSamplesPerSec=5.682812727446553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:43:39,676] [INFO] [timer.py:197:stop] 0/1928, RunningAvgSamplesPerSec=6.327806702706537, CurrSamplesPerSec=5.673969112151267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:43:50,965] [INFO] [timer.py:197:stop] 0/1930, RunningAvgSamplesPerSec=6.327840292960915, CurrSamplesPerSec=5.721387335868599, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:02,315] [INFO] [timer.py:197:stop] 0/1932, RunningAvgSamplesPerSec=6.327859995305297, CurrSamplesPerSec=5.708636502563485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:13,930] [INFO] [timer.py:197:stop] 0/1934, RunningAvgSamplesPerSec=6.327838062306126, CurrSamplesPerSec=5.69510153996506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:25,317] [INFO] [timer.py:197:stop] 0/1936, RunningAvgSamplesPerSec=6.327819423140711, CurrSamplesPerSec=5.685719600318748, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:36,691] [INFO] [timer.py:197:stop] 0/1938, RunningAvgSamplesPerSec=6.327798215144098, CurrSamplesPerSec=5.670691833157158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:48,357] [INFO] [logging.py:68:log_dist] [Rank 0] step=970, skipped=5, lr=[8.96888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 22:44:48,359] [INFO] [timer.py:197:stop] 0/1940, RunningAvgSamplesPerSec=6.327595320689516, CurrSamplesPerSec=5.7164657401281005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:44:59,718] [INFO] [timer.py:197:stop] 0/1942, RunningAvgSamplesPerSec=6.327599374628342, CurrSamplesPerSec=5.7112604712469635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:45:11,048] [INFO] [timer.py:197:stop] 0/1944, RunningAvgSamplesPerSec=6.327614975553183, CurrSamplesPerSec=5.709293604393959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:45:22,718] [INFO] [timer.py:197:stop] 0/1946, RunningAvgSamplesPerSec=6.327627028381739, CurrSamplesPerSec=5.716932997109624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:45:34,352] [INFO] [timer.py:197:stop] 0/1948, RunningAvgSamplesPerSec=6.327614865246932, CurrSamplesPerSec=5.685503318103534, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:45:45,765] [INFO] [timer.py:197:stop] 0/1950, RunningAvgSamplesPerSec=6.327580884082605, CurrSamplesPerSec=5.649820694504452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0101, 'learning_rate': 8.957777777777778e-06, 'epoch': 4.13} [2022-12-16 22:45:57,261] [INFO] [timer.py:197:stop] 0/1952, RunningAvgSamplesPerSec=6.32759681131148, CurrSamplesPerSec=5.711184404892128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:46:08,750] [INFO] [timer.py:197:stop] 0/1954, RunningAvgSamplesPerSec=6.327597897269227, CurrSamplesPerSec=5.704721797659587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:46:20,109] [INFO] [timer.py:197:stop] 0/1956, RunningAvgSamplesPerSec=6.327596837568829, CurrSamplesPerSec=5.676017325551704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:46:31,499] [INFO] [timer.py:197:stop] 0/1958, RunningAvgSamplesPerSec=6.3276281314148894, CurrSamplesPerSec=5.714223774783396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:46:42,824] [INFO] [logging.py:68:log_dist] [Rank 0] step=980, skipped=5, lr=[8.946666666666669e-06], mom=[[0.9, 0.999]] [2022-12-16 22:46:42,826] [INFO] [timer.py:197:stop] 0/1960, RunningAvgSamplesPerSec=6.327636555796214, CurrSamplesPerSec=5.699254723419047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:46:54,425] [INFO] [timer.py:197:stop] 0/1962, RunningAvgSamplesPerSec=6.327482014444832, CurrSamplesPerSec=5.433102247841417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:47:05,753] [INFO] [timer.py:197:stop] 0/1964, RunningAvgSamplesPerSec=6.327500616331316, CurrSamplesPerSec=5.705595310433084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:47:17,115] [INFO] [timer.py:197:stop] 0/1966, RunningAvgSamplesPerSec=6.327492170652153, CurrSamplesPerSec=5.685668538891042, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:47:28,632] [INFO] [timer.py:197:stop] 0/1968, RunningAvgSamplesPerSec=6.327392003434301, CurrSamplesPerSec=5.509042049946703, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:47:40,071] [INFO] [timer.py:197:stop] 0/1970, RunningAvgSamplesPerSec=6.327330482222493, CurrSamplesPerSec=5.6039946277232415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:47:51,426] [INFO] [timer.py:197:stop] 0/1972, RunningAvgSamplesPerSec=6.3273298008094425, CurrSamplesPerSec=5.7071977690610645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:02,785] [INFO] [timer.py:197:stop] 0/1974, RunningAvgSamplesPerSec=6.327328324697508, CurrSamplesPerSec=5.662362644142834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:14,144] [INFO] [timer.py:197:stop] 0/1976, RunningAvgSamplesPerSec=6.327325920760691, CurrSamplesPerSec=5.696949096715646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:25,463] [INFO] [timer.py:197:stop] 0/1978, RunningAvgSamplesPerSec=6.3273483838352025, CurrSamplesPerSec=5.7194220192270775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:37,007] [INFO] [logging.py:68:log_dist] [Rank 0] step=990, skipped=5, lr=[8.924444444444446e-06], mom=[[0.9, 0.999]] [2022-12-16 22:48:37,009] [INFO] [timer.py:197:stop] 0/1980, RunningAvgSamplesPerSec=6.327227334854434, CurrSamplesPerSec=5.493117913683248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:48,361] [INFO] [timer.py:197:stop] 0/1982, RunningAvgSamplesPerSec=6.327218692310974, CurrSamplesPerSec=5.67055886580819, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:48:59,691] [INFO] [timer.py:197:stop] 0/1984, RunningAvgSamplesPerSec=6.3272315122953255, CurrSamplesPerSec=5.713750149849061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:49:11,345] [INFO] [timer.py:197:stop] 0/1986, RunningAvgSamplesPerSec=6.327033256452263, CurrSamplesPerSec=5.3792464182237305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:49:22,687] [INFO] [timer.py:197:stop] 0/1988, RunningAvgSamplesPerSec=6.32703996328013, CurrSamplesPerSec=5.709569748858481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:49:34,309] [INFO] [timer.py:197:stop] 0/1990, RunningAvgSamplesPerSec=6.327031926644361, CurrSamplesPerSec=5.697876832079683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:49:46,057] [INFO] [timer.py:197:stop] 0/1992, RunningAvgSamplesPerSec=6.327036540568657, CurrSamplesPerSec=5.711242487325227, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:49:57,349] [INFO] [timer.py:197:stop] 0/1994, RunningAvgSamplesPerSec=6.32706622307384, CurrSamplesPerSec=5.728870856379531, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:50:08,810] [INFO] [timer.py:197:stop] 0/1996, RunningAvgSamplesPerSec=6.327081375070768, CurrSamplesPerSec=5.718303554862198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:50:20,250] [INFO] [timer.py:197:stop] 0/1998, RunningAvgSamplesPerSec=6.327057320293952, CurrSamplesPerSec=5.702582070221039, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 22:50:31,570] [INFO] [logging.py:68:log_dist] [Rank 0] step=1000, skipped=5, lr=[8.902222222222224e-06], mom=[[0.9, 0.999]] [2022-12-16 22:50:31,571] [INFO] [timer.py:197:stop] 0/2000, RunningAvgSamplesPerSec=6.327079086755388, CurrSamplesPerSec=5.695145037924359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0106, 'learning_rate': 8.902222222222224e-06, 'epoch': 4.24} {'eval_loss': 0.1624755859375, 'eval_wer': 9.988766321062228, 'eval_runtime': 2123.3545, 'eval_samples_per_second': 3.633, 'eval_steps_per_second': 0.454, 'epoch': 4.24} [2022-12-16 23:25:59,883] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is begin to save! [2022-12-16 23:25:59,893] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-1000/global_step1000/mp_rank_00_model_states.pt [2022-12-16 23:25:59,893] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-1000/global_step1000/mp_rank_00_model_states.pt... [2022-12-16 23:26:04,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-1000/global_step1000/mp_rank_00_model_states.pt. [2022-12-16 23:26:04,552] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-16 23:26:26,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-16 23:26:26,839] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-1000/global_step1000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-16 23:26:26,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! [2022-12-16 23:28:19,593] [INFO] [timer.py:197:stop] 0/2002, RunningAvgSamplesPerSec=6.32691884455914, CurrSamplesPerSec=5.4362697760722805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:28:30,905] [INFO] [timer.py:197:stop] 0/2004, RunningAvgSamplesPerSec=6.326938914682652, CurrSamplesPerSec=5.695573286638631, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:28:42,212] [INFO] [timer.py:197:stop] 0/2006, RunningAvgSamplesPerSec=6.326952016228341, CurrSamplesPerSec=5.708716628834756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:28:53,528] [INFO] [timer.py:197:stop] 0/2008, RunningAvgSamplesPerSec=6.326964531480429, CurrSamplesPerSec=5.706633106005204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:29:04,858] [INFO] [timer.py:197:stop] 0/2010, RunningAvgSamplesPerSec=6.326969594684604, CurrSamplesPerSec=5.7054209260205395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:29:16,139] [INFO] [timer.py:197:stop] 0/2012, RunningAvgSamplesPerSec=6.327003835032759, CurrSamplesPerSec=5.727500606066192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:29:27,485] [INFO] [timer.py:197:stop] 0/2014, RunningAvgSamplesPerSec=6.327001593227679, CurrSamplesPerSec=5.687480335337347, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:29:38,758] [INFO] [timer.py:197:stop] 0/2016, RunningAvgSamplesPerSec=6.327025476720991, CurrSamplesPerSec=5.725522783802955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:29:50,077] [INFO] [timer.py:197:stop] 0/2018, RunningAvgSamplesPerSec=6.327022878790163, CurrSamplesPerSec=5.6997549938674155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:01,423] [INFO] [logging.py:68:log_dist] [Rank 0] step=1010, skipped=5, lr=[8.880000000000001e-06], mom=[[0.9, 0.999]] [2022-12-16 23:30:01,424] [INFO] [timer.py:197:stop] 0/2020, RunningAvgSamplesPerSec=6.327014036395244, CurrSamplesPerSec=5.670120237564401, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:12,750] [INFO] [timer.py:197:stop] 0/2022, RunningAvgSamplesPerSec=6.3270172363677, CurrSamplesPerSec=5.679545415403237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:24,037] [INFO] [timer.py:197:stop] 0/2024, RunningAvgSamplesPerSec=6.32704175668712, CurrSamplesPerSec=5.717044770261595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:35,311] [INFO] [timer.py:197:stop] 0/2026, RunningAvgSamplesPerSec=6.327067537750394, CurrSamplesPerSec=5.727426794953754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:46,607] [INFO] [timer.py:197:stop] 0/2028, RunningAvgSamplesPerSec=6.327089040336206, CurrSamplesPerSec=5.693161240461662, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:30:58,050] [INFO] [timer.py:197:stop] 0/2030, RunningAvgSamplesPerSec=6.3271135447751226, CurrSamplesPerSec=5.718585688270621, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:31:09,355] [INFO] [timer.py:197:stop] 0/2032, RunningAvgSamplesPerSec=6.327130920305157, CurrSamplesPerSec=5.722399655712234, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:31:20,707] [INFO] [timer.py:197:stop] 0/2034, RunningAvgSamplesPerSec=6.327119374960312, CurrSamplesPerSec=5.675644093117829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:31:32,059] [INFO] [timer.py:197:stop] 0/2036, RunningAvgSamplesPerSec=6.3270974490717675, CurrSamplesPerSec=5.650667006391652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:31:43,453] [INFO] [timer.py:197:stop] 0/2038, RunningAvgSamplesPerSec=6.327060451393332, CurrSamplesPerSec=5.645447844234822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:31:54,806] [INFO] [logging.py:68:log_dist] [Rank 0] step=1020, skipped=5, lr=[8.857777777777779e-06], mom=[[0.9, 0.999]] [2022-12-16 23:31:54,808] [INFO] [timer.py:197:stop] 0/2040, RunningAvgSamplesPerSec=6.327042536606945, CurrSamplesPerSec=5.676920727197855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:32:06,237] [INFO] [timer.py:197:stop] 0/2042, RunningAvgSamplesPerSec=6.327001069035571, CurrSamplesPerSec=5.630803660428348, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:32:17,545] [INFO] [timer.py:197:stop] 0/2044, RunningAvgSamplesPerSec=6.327008785154406, CurrSamplesPerSec=5.696139630864212, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:32:28,911] [INFO] [timer.py:197:stop] 0/2046, RunningAvgSamplesPerSec=6.326978699203454, CurrSamplesPerSec=5.636375402394337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:32:40,341] [INFO] [timer.py:197:stop] 0/2048, RunningAvgSamplesPerSec=6.326958900442164, CurrSamplesPerSec=5.643415934825894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:32:51,716] [INFO] [timer.py:197:stop] 0/2050, RunningAvgSamplesPerSec=6.326956125505802, CurrSamplesPerSec=5.689418934411699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0098, 'learning_rate': 8.846666666666668e-06, 'epoch': 4.34} [2022-12-16 23:33:03,018] [INFO] [timer.py:197:stop] 0/2052, RunningAvgSamplesPerSec=6.3269793020330125, CurrSamplesPerSec=5.7128155365674695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:33:14,345] [INFO] [timer.py:197:stop] 0/2054, RunningAvgSamplesPerSec=6.326988840063246, CurrSamplesPerSec=5.705008897278177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:33:25,698] [INFO] [timer.py:197:stop] 0/2056, RunningAvgSamplesPerSec=6.3269836079402495, CurrSamplesPerSec=5.6747514146635405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:33:37,014] [INFO] [timer.py:197:stop] 0/2058, RunningAvgSamplesPerSec=6.326998426808016, CurrSamplesPerSec=5.708111852476962, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:33:48,355] [INFO] [logging.py:68:log_dist] [Rank 0] step=1030, skipped=5, lr=[8.835555555555557e-06], mom=[[0.9, 0.999]] [2022-12-16 23:33:48,357] [INFO] [timer.py:197:stop] 0/2060, RunningAvgSamplesPerSec=6.326992420904756, CurrSamplesPerSec=5.670153533559918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:33:59,735] [INFO] [timer.py:197:stop] 0/2062, RunningAvgSamplesPerSec=6.326985330959451, CurrSamplesPerSec=5.671009542558283, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:34:11,227] [INFO] [timer.py:197:stop] 0/2064, RunningAvgSamplesPerSec=6.326965287952393, CurrSamplesPerSec=5.665643214621001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:34:22,581] [INFO] [timer.py:197:stop] 0/2066, RunningAvgSamplesPerSec=6.326978858724929, CurrSamplesPerSec=5.681988992282914, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:34:33,984] [INFO] [timer.py:197:stop] 0/2068, RunningAvgSamplesPerSec=6.3269599699551415, CurrSamplesPerSec=5.669742032086501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:34:45,300] [INFO] [timer.py:197:stop] 0/2070, RunningAvgSamplesPerSec=6.326969935394964, CurrSamplesPerSec=5.712483643684802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:34:56,646] [INFO] [timer.py:197:stop] 0/2072, RunningAvgSamplesPerSec=6.326979694877666, CurrSamplesPerSec=5.7023923644599925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:35:08,016] [INFO] [timer.py:197:stop] 0/2074, RunningAvgSamplesPerSec=6.326982548684312, CurrSamplesPerSec=5.703492736656249, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:35:19,363] [INFO] [timer.py:197:stop] 0/2076, RunningAvgSamplesPerSec=6.326988173584383, CurrSamplesPerSec=5.683898093630149, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:35:30,742] [INFO] [timer.py:197:stop] 0/2078, RunningAvgSamplesPerSec=6.3269830341859095, CurrSamplesPerSec=5.673695441170683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:35:42,209] [INFO] [logging.py:68:log_dist] [Rank 0] step=1040, skipped=5, lr=[8.813333333333334e-06], mom=[[0.9, 0.999]] [2022-12-16 23:35:42,211] [INFO] [timer.py:197:stop] 0/2080, RunningAvgSamplesPerSec=6.326975967919922, CurrSamplesPerSec=5.667466200267374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:35:53,587] [INFO] [timer.py:197:stop] 0/2082, RunningAvgSamplesPerSec=6.32695109038253, CurrSamplesPerSec=5.666170611529979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:36:04,935] [INFO] [timer.py:197:stop] 0/2084, RunningAvgSamplesPerSec=6.326944440471758, CurrSamplesPerSec=5.69634632742204, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:36:16,251] [INFO] [timer.py:197:stop] 0/2086, RunningAvgSamplesPerSec=6.326958149657141, CurrSamplesPerSec=5.7132231002860445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:36:27,626] [INFO] [timer.py:197:stop] 0/2088, RunningAvgSamplesPerSec=6.326936819769224, CurrSamplesPerSec=5.6444559187739705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:36:39,014] [INFO] [timer.py:197:stop] 0/2090, RunningAvgSamplesPerSec=6.326958427850613, CurrSamplesPerSec=5.718727009077727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:36:50,379] [INFO] [timer.py:197:stop] 0/2092, RunningAvgSamplesPerSec=6.326959830114065, CurrSamplesPerSec=5.6926826497948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:37:01,709] [INFO] [timer.py:197:stop] 0/2094, RunningAvgSamplesPerSec=6.326965638724746, CurrSamplesPerSec=5.685005305374708, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:37:13,012] [INFO] [timer.py:197:stop] 0/2096, RunningAvgSamplesPerSec=6.3269861274948385, CurrSamplesPerSec=5.728556410931646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:37:24,339] [INFO] [timer.py:197:stop] 0/2098, RunningAvgSamplesPerSec=6.326991368641418, CurrSamplesPerSec=5.6820460013056815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:37:35,844] [INFO] [logging.py:68:log_dist] [Rank 0] step=1050, skipped=5, lr=[8.791111111111112e-06], mom=[[0.9, 0.999]] [2022-12-16 23:37:35,846] [INFO] [timer.py:197:stop] 0/2100, RunningAvgSamplesPerSec=6.326890220539331, CurrSamplesPerSec=5.5338151432021565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0109, 'learning_rate': 8.791111111111112e-06, 'epoch': 4.45} [2022-12-16 23:37:47,173] [INFO] [timer.py:197:stop] 0/2102, RunningAvgSamplesPerSec=6.326888480761292, CurrSamplesPerSec=5.678281053440078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:37:58,497] [INFO] [timer.py:197:stop] 0/2104, RunningAvgSamplesPerSec=6.326901461244185, CurrSamplesPerSec=5.715132566323613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:38:09,962] [INFO] [timer.py:197:stop] 0/2106, RunningAvgSamplesPerSec=6.326842751936067, CurrSamplesPerSec=5.588156937021963, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:38:21,235] [INFO] [timer.py:197:stop] 0/2108, RunningAvgSamplesPerSec=6.326883905218231, CurrSamplesPerSec=5.745308582057639, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:38:32,501] [INFO] [timer.py:197:stop] 0/2110, RunningAvgSamplesPerSec=6.326919605153342, CurrSamplesPerSec=5.71905889677904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:38:44,049] [INFO] [timer.py:197:stop] 0/2112, RunningAvgSamplesPerSec=6.3267923986561145, CurrSamplesPerSec=5.4640084207893445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:38:55,381] [INFO] [timer.py:197:stop] 0/2114, RunningAvgSamplesPerSec=6.326795940444241, CurrSamplesPerSec=5.689030433691431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:39:06,736] [INFO] [timer.py:197:stop] 0/2116, RunningAvgSamplesPerSec=6.326797512321113, CurrSamplesPerSec=5.684546141744876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:39:18,197] [INFO] [timer.py:197:stop] 0/2118, RunningAvgSamplesPerSec=6.326724028446822, CurrSamplesPerSec=5.569149724324082, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:39:29,514] [INFO] [logging.py:68:log_dist] [Rank 0] step=1060, skipped=5, lr=[8.76888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 23:39:29,516] [INFO] [timer.py:197:stop] 0/2120, RunningAvgSamplesPerSec=6.326731548822072, CurrSamplesPerSec=5.7124179991115005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:39:40,843] [INFO] [timer.py:197:stop] 0/2122, RunningAvgSamplesPerSec=6.326737425035551, CurrSamplesPerSec=5.714336415176261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:39:52,446] [INFO] [timer.py:197:stop] 0/2124, RunningAvgSamplesPerSec=6.326762471651918, CurrSamplesPerSec=5.718454120004614, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:40:03,786] [INFO] [timer.py:197:stop] 0/2126, RunningAvgSamplesPerSec=6.326765987597059, CurrSamplesPerSec=5.6890827612364046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:40:15,174] [INFO] [timer.py:197:stop] 0/2128, RunningAvgSamplesPerSec=6.3267923368660135, CurrSamplesPerSec=5.709115594022861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:40:26,696] [INFO] [timer.py:197:stop] 0/2130, RunningAvgSamplesPerSec=6.326821169454405, CurrSamplesPerSec=5.706149094055909, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:40:38,041] [INFO] [timer.py:197:stop] 0/2132, RunningAvgSamplesPerSec=6.326823566813653, CurrSamplesPerSec=5.685069358215456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:40:49,504] [INFO] [timer.py:197:stop] 0/2134, RunningAvgSamplesPerSec=6.326811885483646, CurrSamplesPerSec=5.675575692495808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:00,946] [INFO] [timer.py:197:stop] 0/2136, RunningAvgSamplesPerSec=6.326799375130904, CurrSamplesPerSec=5.682672694867065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:12,263] [INFO] [timer.py:197:stop] 0/2138, RunningAvgSamplesPerSec=6.326803172324738, CurrSamplesPerSec=5.6830444459960425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:23,789] [INFO] [logging.py:68:log_dist] [Rank 0] step=1070, skipped=5, lr=[8.746666666666667e-06], mom=[[0.9, 0.999]] [2022-12-16 23:41:23,791] [INFO] [timer.py:197:stop] 0/2140, RunningAvgSamplesPerSec=6.326814093016492, CurrSamplesPerSec=5.6949693585774135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:35,241] [INFO] [timer.py:197:stop] 0/2142, RunningAvgSamplesPerSec=6.326802233599372, CurrSamplesPerSec=5.693648123620178, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:46,557] [INFO] [timer.py:197:stop] 0/2144, RunningAvgSamplesPerSec=6.326814718958849, CurrSamplesPerSec=5.692429864700926, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:41:57,844] [INFO] [timer.py:197:stop] 0/2146, RunningAvgSamplesPerSec=6.326832085580104, CurrSamplesPerSec=5.6958365030451725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:42:09,151] [INFO] [timer.py:197:stop] 0/2148, RunningAvgSamplesPerSec=6.326850473492747, CurrSamplesPerSec=5.708753293424126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:42:20,475] [INFO] [timer.py:197:stop] 0/2150, RunningAvgSamplesPerSec=6.326859635258952, CurrSamplesPerSec=5.688270707613868, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0098, 'learning_rate': 8.735555555555556e-06, 'epoch': 4.56} [2022-12-16 23:42:31,769] [INFO] [timer.py:197:stop] 0/2152, RunningAvgSamplesPerSec=6.326886225644251, CurrSamplesPerSec=5.717377436917904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:42:43,104] [INFO] [timer.py:197:stop] 0/2154, RunningAvgSamplesPerSec=6.3268885958737, CurrSamplesPerSec=5.718993344686862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:42:54,383] [INFO] [timer.py:197:stop] 0/2156, RunningAvgSamplesPerSec=6.326912510207333, CurrSamplesPerSec=5.71347919482766, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:43:05,649] [INFO] [timer.py:197:stop] 0/2158, RunningAvgSamplesPerSec=6.326952236351088, CurrSamplesPerSec=5.745678243164284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:43:17,030] [INFO] [logging.py:68:log_dist] [Rank 0] step=1080, skipped=5, lr=[8.724444444444445e-06], mom=[[0.9, 0.999]] [2022-12-16 23:43:17,032] [INFO] [timer.py:197:stop] 0/2160, RunningAvgSamplesPerSec=6.326925724205436, CurrSamplesPerSec=5.710000412665337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:43:28,413] [INFO] [timer.py:197:stop] 0/2162, RunningAvgSamplesPerSec=6.326951473177989, CurrSamplesPerSec=5.713769365768101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:43:40,014] [INFO] [timer.py:197:stop] 0/2164, RunningAvgSamplesPerSec=6.326797110978181, CurrSamplesPerSec=5.410640719876528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:43:51,410] [INFO] [timer.py:197:stop] 0/2166, RunningAvgSamplesPerSec=6.326801039464514, CurrSamplesPerSec=5.689852111364271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:02,776] [INFO] [timer.py:197:stop] 0/2168, RunningAvgSamplesPerSec=6.3268023012309405, CurrSamplesPerSec=5.672488341976959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:14,145] [INFO] [timer.py:197:stop] 0/2170, RunningAvgSamplesPerSec=6.3267722977202325, CurrSamplesPerSec=5.65816762228027, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:25,666] [INFO] [timer.py:197:stop] 0/2172, RunningAvgSamplesPerSec=6.326781274881491, CurrSamplesPerSec=5.69896820264145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:36,942] [INFO] [timer.py:197:stop] 0/2174, RunningAvgSamplesPerSec=6.326807176968383, CurrSamplesPerSec=5.711531701963128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:48,246] [INFO] [timer.py:197:stop] 0/2176, RunningAvgSamplesPerSec=6.326829037500474, CurrSamplesPerSec=5.716765467323886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:44:59,728] [INFO] [timer.py:197:stop] 0/2178, RunningAvgSamplesPerSec=6.3268633695582235, CurrSamplesPerSec=5.7243706817524425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:45:11,026] [INFO] [logging.py:68:log_dist] [Rank 0] step=1090, skipped=5, lr=[8.702222222222222e-06], mom=[[0.9, 0.999]] [2022-12-16 23:45:11,027] [INFO] [timer.py:197:stop] 0/2180, RunningAvgSamplesPerSec=6.326867601605905, CurrSamplesPerSec=5.683175111827742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:45:22,528] [INFO] [timer.py:197:stop] 0/2182, RunningAvgSamplesPerSec=6.3267694111200745, CurrSamplesPerSec=5.507590052899119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:45:33,901] [INFO] [timer.py:197:stop] 0/2184, RunningAvgSamplesPerSec=6.3267964922491355, CurrSamplesPerSec=5.718324506818116, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:45:45,296] [INFO] [timer.py:197:stop] 0/2186, RunningAvgSamplesPerSec=6.326810767985056, CurrSamplesPerSec=5.721014453371941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:45:56,924] [INFO] [timer.py:197:stop] 0/2188, RunningAvgSamplesPerSec=6.326635937130851, CurrSamplesPerSec=5.385385720838578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:46:08,265] [INFO] [timer.py:197:stop] 0/2190, RunningAvgSamplesPerSec=6.326648563105169, CurrSamplesPerSec=5.7219939529667325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:46:19,609] [INFO] [timer.py:197:stop] 0/2192, RunningAvgSamplesPerSec=6.326656863235978, CurrSamplesPerSec=5.6884170433826915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:46:30,957] [INFO] [timer.py:197:stop] 0/2194, RunningAvgSamplesPerSec=6.326641016017453, CurrSamplesPerSec=5.63037116066852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:46:42,464] [INFO] [timer.py:197:stop] 0/2196, RunningAvgSamplesPerSec=6.326632950912346, CurrSamplesPerSec=5.66079743690004, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:46:53,853] [INFO] [timer.py:197:stop] 0/2198, RunningAvgSamplesPerSec=6.326654620094145, CurrSamplesPerSec=5.713162788862371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:47:05,232] [INFO] [logging.py:68:log_dist] [Rank 0] step=1100, skipped=5, lr=[8.68e-06], mom=[[0.9, 0.999]] [2022-12-16 23:47:05,237] [INFO] [timer.py:197:stop] 0/2200, RunningAvgSamplesPerSec=6.326613196775679, CurrSamplesPerSec=5.610251308986749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0095, 'learning_rate': 8.68e-06, 'epoch': 4.66} [2022-12-16 23:47:16,813] [INFO] [timer.py:197:stop] 0/2202, RunningAvgSamplesPerSec=6.326616301706529, CurrSamplesPerSec=5.687317901155183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:47:28,304] [INFO] [timer.py:197:stop] 0/2204, RunningAvgSamplesPerSec=6.326636280550874, CurrSamplesPerSec=5.716672940546703, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:47:39,604] [INFO] [timer.py:197:stop] 0/2206, RunningAvgSamplesPerSec=6.326631705077604, CurrSamplesPerSec=5.670984383276135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:47:51,208] [INFO] [timer.py:197:stop] 0/2208, RunningAvgSamplesPerSec=6.326616045712276, CurrSamplesPerSec=5.666151714434629, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:02,736] [INFO] [timer.py:197:stop] 0/2210, RunningAvgSamplesPerSec=6.326617519714017, CurrSamplesPerSec=5.679849455627446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:14,047] [INFO] [timer.py:197:stop] 0/2212, RunningAvgSamplesPerSec=6.326616544719444, CurrSamplesPerSec=5.684408671940497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:25,450] [INFO] [timer.py:197:stop] 0/2214, RunningAvgSamplesPerSec=6.326636718841098, CurrSamplesPerSec=5.7177756649699925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:36,769] [INFO] [timer.py:197:stop] 0/2216, RunningAvgSamplesPerSec=6.326639581809611, CurrSamplesPerSec=5.690913384163466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:48,371] [INFO] [timer.py:197:stop] 0/2218, RunningAvgSamplesPerSec=6.326473489332864, CurrSamplesPerSec=5.398720825015092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:48:59,676] [INFO] [logging.py:68:log_dist] [Rank 0] step=1110, skipped=5, lr=[8.657777777777778e-06], mom=[[0.9, 0.999]] [2022-12-16 23:48:59,677] [INFO] [timer.py:197:stop] 0/2220, RunningAvgSamplesPerSec=6.326484511281279, CurrSamplesPerSec=5.696579392761181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:49:11,062] [INFO] [timer.py:197:stop] 0/2222, RunningAvgSamplesPerSec=6.326506791798435, CurrSamplesPerSec=5.715751001795795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:49:22,510] [INFO] [timer.py:197:stop] 0/2224, RunningAvgSamplesPerSec=6.32643017700071, CurrSamplesPerSec=5.538575914024925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:49:34,012] [INFO] [timer.py:197:stop] 0/2226, RunningAvgSamplesPerSec=6.326427190854699, CurrSamplesPerSec=5.682189130264888, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:49:45,549] [INFO] [timer.py:197:stop] 0/2228, RunningAvgSamplesPerSec=6.32642059283797, CurrSamplesPerSec=5.6687824932599185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:49:57,038] [INFO] [timer.py:197:stop] 0/2230, RunningAvgSamplesPerSec=6.326329693673774, CurrSamplesPerSec=5.519171447397812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:50:08,352] [INFO] [timer.py:197:stop] 0/2232, RunningAvgSamplesPerSec=6.326326483515551, CurrSamplesPerSec=5.688925299229948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:50:19,681] [INFO] [timer.py:197:stop] 0/2234, RunningAvgSamplesPerSec=6.326324285237307, CurrSamplesPerSec=5.697434693184849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:50:31,088] [INFO] [timer.py:197:stop] 0/2236, RunningAvgSamplesPerSec=6.326270862338546, CurrSamplesPerSec=5.5946294316419705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:50:42,394] [INFO] [timer.py:197:stop] 0/2238, RunningAvgSamplesPerSec=6.326283074554998, CurrSamplesPerSec=5.682271402610858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:50:53,976] [INFO] [logging.py:68:log_dist] [Rank 0] step=1120, skipped=5, lr=[8.635555555555555e-06], mom=[[0.9, 0.999]] [2022-12-16 23:50:53,978] [INFO] [timer.py:197:stop] 0/2240, RunningAvgSamplesPerSec=6.326285225238069, CurrSamplesPerSec=5.684244487296989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:51:05,362] [INFO] [timer.py:197:stop] 0/2242, RunningAvgSamplesPerSec=6.326285434003683, CurrSamplesPerSec=5.705309364393889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:51:16,685] [INFO] [timer.py:197:stop] 0/2244, RunningAvgSamplesPerSec=6.326280138057781, CurrSamplesPerSec=5.686927277705469, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:51:28,045] [INFO] [timer.py:197:stop] 0/2246, RunningAvgSamplesPerSec=6.326284228598904, CurrSamplesPerSec=5.698377344512056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:51:39,504] [INFO] [timer.py:197:stop] 0/2248, RunningAvgSamplesPerSec=6.32628714792387, CurrSamplesPerSec=5.695185636619195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:51:50,789] [INFO] [timer.py:197:stop] 0/2250, RunningAvgSamplesPerSec=6.326311280608016, CurrSamplesPerSec=5.715411222963278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0109, 'learning_rate': 8.624444444444446e-06, 'epoch': 4.77} [2022-12-16 23:52:02,112] [INFO] [timer.py:197:stop] 0/2252, RunningAvgSamplesPerSec=6.326315721304854, CurrSamplesPerSec=5.688322056869689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:52:13,595] [INFO] [timer.py:197:stop] 0/2254, RunningAvgSamplesPerSec=6.326311919093778, CurrSamplesPerSec=5.68811931708187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:52:24,891] [INFO] [timer.py:197:stop] 0/2256, RunningAvgSamplesPerSec=6.3263219325412035, CurrSamplesPerSec=5.695825384137346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:52:36,162] [INFO] [timer.py:197:stop] 0/2258, RunningAvgSamplesPerSec=6.3263445079459055, CurrSamplesPerSec=5.699815264594685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:52:47,467] [INFO] [logging.py:68:log_dist] [Rank 0] step=1130, skipped=5, lr=[8.613333333333333e-06], mom=[[0.9, 0.999]] [2022-12-16 23:52:47,468] [INFO] [timer.py:197:stop] 0/2260, RunningAvgSamplesPerSec=6.326355199772841, CurrSamplesPerSec=5.714679715988128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:52:58,766] [INFO] [timer.py:197:stop] 0/2262, RunningAvgSamplesPerSec=6.3263710788386165, CurrSamplesPerSec=5.703612710407144, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:53:10,048] [INFO] [timer.py:197:stop] 0/2264, RunningAvgSamplesPerSec=6.32638719408835, CurrSamplesPerSec=5.7117968816261335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:53:21,713] [INFO] [timer.py:197:stop] 0/2266, RunningAvgSamplesPerSec=6.326200249143209, CurrSamplesPerSec=5.697177132955527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:53:33,031] [INFO] [timer.py:197:stop] 0/2268, RunningAvgSamplesPerSec=6.3262049339700885, CurrSamplesPerSec=5.700018839004315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:53:44,333] [INFO] [timer.py:197:stop] 0/2270, RunningAvgSamplesPerSec=6.326220176723921, CurrSamplesPerSec=5.686674762540661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:53:55,857] [INFO] [timer.py:197:stop] 0/2272, RunningAvgSamplesPerSec=6.326179349106179, CurrSamplesPerSec=5.696734619054877, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:54:07,129] [INFO] [timer.py:197:stop] 0/2274, RunningAvgSamplesPerSec=6.326192560287468, CurrSamplesPerSec=5.71834692070838, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:54:18,372] [INFO] [timer.py:197:stop] 0/2276, RunningAvgSamplesPerSec=6.326232118597809, CurrSamplesPerSec=5.741100555354587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:54:29,739] [INFO] [timer.py:197:stop] 0/2278, RunningAvgSamplesPerSec=6.326210908425242, CurrSamplesPerSec=5.7071654926568005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:54:41,057] [INFO] [logging.py:68:log_dist] [Rank 0] step=1140, skipped=5, lr=[8.591111111111112e-06], mom=[[0.9, 0.999]] [2022-12-16 23:54:41,058] [INFO] [timer.py:197:stop] 0/2280, RunningAvgSamplesPerSec=6.3262178608961595, CurrSamplesPerSec=5.7121705083245375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:54:52,376] [INFO] [timer.py:197:stop] 0/2282, RunningAvgSamplesPerSec=6.3262245491701075, CurrSamplesPerSec=5.720133044142513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:55:03,710] [INFO] [timer.py:197:stop] 0/2284, RunningAvgSamplesPerSec=6.32621352411815, CurrSamplesPerSec=5.684511472725911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:55:14,977] [INFO] [timer.py:197:stop] 0/2286, RunningAvgSamplesPerSec=6.326238918068466, CurrSamplesPerSec=5.719674526174559, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:55:26,308] [INFO] [timer.py:197:stop] 0/2288, RunningAvgSamplesPerSec=6.3262380095475725, CurrSamplesPerSec=5.681550999090691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:55:37,984] [INFO] [timer.py:197:stop] 0/2290, RunningAvgSamplesPerSec=6.326233512446577, CurrSamplesPerSec=5.691795948502412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:55:49,380] [INFO] [timer.py:197:stop] 0/2292, RunningAvgSamplesPerSec=6.326239816856204, CurrSamplesPerSec=5.708440324097912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:56:00,697] [INFO] [timer.py:197:stop] 0/2294, RunningAvgSamplesPerSec=6.326241228905768, CurrSamplesPerSec=5.710928273024266, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:56:12,113] [INFO] [timer.py:197:stop] 0/2296, RunningAvgSamplesPerSec=6.326238293926612, CurrSamplesPerSec=5.679791769604584, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:56:23,643] [INFO] [timer.py:197:stop] 0/2298, RunningAvgSamplesPerSec=6.326238784711817, CurrSamplesPerSec=5.698628480539143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:56:34,951] [INFO] [logging.py:68:log_dist] [Rank 0] step=1150, skipped=5, lr=[8.56888888888889e-06], mom=[[0.9, 0.999]] [2022-12-16 23:56:34,953] [INFO] [timer.py:197:stop] 0/2300, RunningAvgSamplesPerSec=6.326240334499451, CurrSamplesPerSec=5.6804126769895165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0102, 'learning_rate': 8.56888888888889e-06, 'epoch': 4.87} [2022-12-16 23:56:46,443] [INFO] [timer.py:197:stop] 0/2302, RunningAvgSamplesPerSec=6.326247554449026, CurrSamplesPerSec=5.690969124591848, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:56:57,778] [INFO] [timer.py:197:stop] 0/2304, RunningAvgSamplesPerSec=6.326248708829878, CurrSamplesPerSec=5.697551509866096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:57:09,072] [INFO] [timer.py:197:stop] 0/2306, RunningAvgSamplesPerSec=6.326259242853581, CurrSamplesPerSec=5.697326345762504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:57:20,381] [INFO] [timer.py:197:stop] 0/2308, RunningAvgSamplesPerSec=6.326269961524548, CurrSamplesPerSec=5.69958241847372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:57:31,664] [INFO] [timer.py:197:stop] 0/2310, RunningAvgSamplesPerSec=6.326284545954589, CurrSamplesPerSec=5.707692153291593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:57:42,996] [INFO] [timer.py:197:stop] 0/2312, RunningAvgSamplesPerSec=6.326284080838427, CurrSamplesPerSec=5.6587606682746046, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:57:54,323] [INFO] [timer.py:197:stop] 0/2314, RunningAvgSamplesPerSec=6.326284805015643, CurrSamplesPerSec=5.685244668107953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:58:05,615] [INFO] [timer.py:197:stop] 0/2316, RunningAvgSamplesPerSec=6.326290782416112, CurrSamplesPerSec=5.684297930523931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:58:16,889] [INFO] [timer.py:197:stop] 0/2318, RunningAvgSamplesPerSec=6.326316142943657, CurrSamplesPerSec=5.723674224372936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:58:28,266] [INFO] [logging.py:68:log_dist] [Rank 0] step=1160, skipped=5, lr=[8.546666666666667e-06], mom=[[0.9, 0.999]] [2022-12-16 23:58:28,267] [INFO] [timer.py:197:stop] 0/2320, RunningAvgSamplesPerSec=6.326323787028186, CurrSamplesPerSec=5.688317476381979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:58:39,623] [INFO] [timer.py:197:stop] 0/2322, RunningAvgSamplesPerSec=6.326342023787668, CurrSamplesPerSec=5.71290112985152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:58:50,962] [INFO] [timer.py:197:stop] 0/2324, RunningAvgSamplesPerSec=6.326356330123207, CurrSamplesPerSec=5.708964548907816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:02,276] [INFO] [timer.py:197:stop] 0/2326, RunningAvgSamplesPerSec=6.326361865091982, CurrSamplesPerSec=5.6786080227710025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:13,775] [INFO] [timer.py:197:stop] 0/2328, RunningAvgSamplesPerSec=6.326370235319124, CurrSamplesPerSec=5.678698840919953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:25,300] [INFO] [timer.py:197:stop] 0/2330, RunningAvgSamplesPerSec=6.326394180628012, CurrSamplesPerSec=5.7203590393708605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:36,883] [INFO] [timer.py:197:stop] 0/2332, RunningAvgSamplesPerSec=6.32626253016581, CurrSamplesPerSec=5.445651602800221, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:48,388] [INFO] [timer.py:197:stop] 0/2334, RunningAvgSamplesPerSec=6.326292110693107, CurrSamplesPerSec=5.704136048880114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-16 23:59:59,841] [INFO] [timer.py:197:stop] 0/2336, RunningAvgSamplesPerSec=6.32631299484464, CurrSamplesPerSec=5.711764553181547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:00:11,419] [INFO] [timer.py:197:stop] 0/2338, RunningAvgSamplesPerSec=6.326181513763306, CurrSamplesPerSec=5.415030591285196, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:00:22,709] [INFO] [logging.py:68:log_dist] [Rank 0] step=1170, skipped=5, lr=[8.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 00:00:22,711] [INFO] [timer.py:197:stop] 0/2340, RunningAvgSamplesPerSec=6.326202325976543, CurrSamplesPerSec=5.74276970829813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:00:34,006] [INFO] [timer.py:197:stop] 0/2342, RunningAvgSamplesPerSec=6.32620579711919, CurrSamplesPerSec=5.721006406084096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:00:45,502] [INFO] [timer.py:197:stop] 0/2344, RunningAvgSamplesPerSec=6.326118889510103, CurrSamplesPerSec=5.52402753944322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:00:56,800] [INFO] [timer.py:197:stop] 0/2346, RunningAvgSamplesPerSec=6.326135872170397, CurrSamplesPerSec=5.710939936974358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:01:08,272] [INFO] [timer.py:197:stop] 0/2348, RunningAvgSamplesPerSec=6.326128201303538, CurrSamplesPerSec=5.6852942769906845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:01:19,855] [INFO] [timer.py:197:stop] 0/2350, RunningAvgSamplesPerSec=6.3260915621697915, CurrSamplesPerSec=5.697348595343359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0114, 'learning_rate': 8.513333333333335e-06, 'epoch': 4.98} [2022-12-17 00:01:31,193] [INFO] [timer.py:197:stop] 0/2352, RunningAvgSamplesPerSec=6.326086616929433, CurrSamplesPerSec=5.684411320157631, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:01:42,756] [INFO] [timer.py:197:stop] 0/2354, RunningAvgSamplesPerSec=6.326080971833527, CurrSamplesPerSec=5.689428581287169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:01:54,292] [INFO] [timer.py:197:stop] 0/2356, RunningAvgSamplesPerSec=6.3260681795923155, CurrSamplesPerSec=5.702536278019622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:02:05,644] [INFO] [timer.py:197:stop] 0/2358, RunningAvgSamplesPerSec=6.326060427681191, CurrSamplesPerSec=5.680715367471306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:02:14,121] [INFO] [logging.py:68:log_dist] [Rank 0] step=1180, skipped=5, lr=[8.502222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 00:02:14,123] [INFO] [timer.py:197:stop] 0/2360, RunningAvgSamplesPerSec=6.327393469676367, CurrSamplesPerSec=10.255969396928553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:02:25,787] [INFO] [timer.py:197:stop] 0/2362, RunningAvgSamplesPerSec=6.327398217095025, CurrSamplesPerSec=5.707732202892349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:02:37,136] [INFO] [timer.py:197:stop] 0/2364, RunningAvgSamplesPerSec=6.327384932114008, CurrSamplesPerSec=5.666322510482579, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:02:48,647] [INFO] [timer.py:197:stop] 0/2366, RunningAvgSamplesPerSec=6.327379114654407, CurrSamplesPerSec=5.681089748134514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:00,366] [INFO] [timer.py:197:stop] 0/2368, RunningAvgSamplesPerSec=6.327323722516558, CurrSamplesPerSec=5.693677831984537, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:11,697] [INFO] [timer.py:197:stop] 0/2370, RunningAvgSamplesPerSec=6.327321020826525, CurrSamplesPerSec=5.679773983317212, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:23,253] [INFO] [timer.py:197:stop] 0/2372, RunningAvgSamplesPerSec=6.327311747242629, CurrSamplesPerSec=5.681938478914834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:34,741] [INFO] [timer.py:197:stop] 0/2374, RunningAvgSamplesPerSec=6.327319351743848, CurrSamplesPerSec=5.691096535385634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:46,059] [INFO] [timer.py:197:stop] 0/2376, RunningAvgSamplesPerSec=6.32732351833207, CurrSamplesPerSec=5.700592362818308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:03:57,611] [INFO] [timer.py:197:stop] 0/2378, RunningAvgSamplesPerSec=6.327352322513417, CurrSamplesPerSec=5.712766175697562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:04:09,069] [INFO] [logging.py:68:log_dist] [Rank 0] step=1190, skipped=5, lr=[8.48e-06], mom=[[0.9, 0.999]] [2022-12-17 00:04:09,071] [INFO] [timer.py:197:stop] 0/2380, RunningAvgSamplesPerSec=6.327346203150038, CurrSamplesPerSec=5.69740954073738, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:04:20,405] [INFO] [timer.py:197:stop] 0/2382, RunningAvgSamplesPerSec=6.327342004866964, CurrSamplesPerSec=5.68034271896564, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:04:31,873] [INFO] [timer.py:197:stop] 0/2384, RunningAvgSamplesPerSec=6.327349613345468, CurrSamplesPerSec=5.699608316218365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:04:43,357] [INFO] [timer.py:197:stop] 0/2386, RunningAvgSamplesPerSec=6.327357594697813, CurrSamplesPerSec=5.7072145141063535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:04:54,649] [INFO] [timer.py:197:stop] 0/2388, RunningAvgSamplesPerSec=6.327373490088198, CurrSamplesPerSec=5.7044960664789475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:05:06,025] [INFO] [timer.py:197:stop] 0/2390, RunningAvgSamplesPerSec=6.327346305357408, CurrSamplesPerSec=5.649194330102887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:05:17,540] [INFO] [timer.py:197:stop] 0/2392, RunningAvgSamplesPerSec=6.327245483140265, CurrSamplesPerSec=5.660994413395568, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:05:28,835] [INFO] [timer.py:197:stop] 0/2394, RunningAvgSamplesPerSec=6.327261384200125, CurrSamplesPerSec=5.714668036756897, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:05:40,427] [INFO] [timer.py:197:stop] 0/2396, RunningAvgSamplesPerSec=6.327119563889244, CurrSamplesPerSec=5.436875578629744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:05:51,965] [INFO] [timer.py:197:stop] 0/2398, RunningAvgSamplesPerSec=6.327112445362932, CurrSamplesPerSec=5.679763167386114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:06:03,374] [INFO] [logging.py:68:log_dist] [Rank 0] step=1200, skipped=5, lr=[8.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 00:06:03,376] [INFO] [timer.py:197:stop] 0/2400, RunningAvgSamplesPerSec=6.327118049781438, CurrSamplesPerSec=5.7093232333914035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0097, 'learning_rate': 8.457777777777778e-06, 'epoch': 5.08} [2022-12-17 00:06:15,117] [INFO] [timer.py:197:stop] 0/2402, RunningAvgSamplesPerSec=6.326904179662763, CurrSamplesPerSec=5.29673078600285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:06:26,595] [INFO] [timer.py:197:stop] 0/2404, RunningAvgSamplesPerSec=6.3268834712487205, CurrSamplesPerSec=5.646651775835152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:06:38,108] [INFO] [timer.py:197:stop] 0/2406, RunningAvgSamplesPerSec=6.326888160520098, CurrSamplesPerSec=5.690182585796365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:06:49,777] [INFO] [timer.py:197:stop] 0/2408, RunningAvgSamplesPerSec=6.326730764415, CurrSamplesPerSec=5.387032140158966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:01,299] [INFO] [timer.py:197:stop] 0/2410, RunningAvgSamplesPerSec=6.326726621316895, CurrSamplesPerSec=5.6720043517375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:12,693] [INFO] [timer.py:197:stop] 0/2412, RunningAvgSamplesPerSec=6.326735800305985, CurrSamplesPerSec=5.697340614495457, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:24,145] [INFO] [timer.py:197:stop] 0/2414, RunningAvgSamplesPerSec=6.326685750989741, CurrSamplesPerSec=5.598387443582945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:35,424] [INFO] [timer.py:197:stop] 0/2416, RunningAvgSamplesPerSec=6.326691411324269, CurrSamplesPerSec=5.688779660662067, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:46,756] [INFO] [timer.py:197:stop] 0/2418, RunningAvgSamplesPerSec=6.326693198828202, CurrSamplesPerSec=5.704795752146189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:07:58,143] [INFO] [logging.py:68:log_dist] [Rank 0] step=1210, skipped=5, lr=[8.435555555555555e-06], mom=[[0.9, 0.999]] [2022-12-17 00:07:58,145] [INFO] [timer.py:197:stop] 0/2420, RunningAvgSamplesPerSec=6.3266655254786315, CurrSamplesPerSec=5.633763495373306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:08:09,440] [INFO] [timer.py:197:stop] 0/2422, RunningAvgSamplesPerSec=6.326681867189228, CurrSamplesPerSec=5.711533646362335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:08:20,823] [INFO] [timer.py:197:stop] 0/2424, RunningAvgSamplesPerSec=6.326712186913518, CurrSamplesPerSec=5.7466206883559385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:08:32,438] [INFO] [timer.py:197:stop] 0/2426, RunningAvgSamplesPerSec=6.326729864602853, CurrSamplesPerSec=5.731897997137077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:08:43,746] [INFO] [timer.py:197:stop] 0/2428, RunningAvgSamplesPerSec=6.326729675312963, CurrSamplesPerSec=5.687243194186456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:08:55,189] [INFO] [timer.py:197:stop] 0/2430, RunningAvgSamplesPerSec=6.326737809957682, CurrSamplesPerSec=5.701821627330498, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:09:06,855] [INFO] [timer.py:197:stop] 0/2432, RunningAvgSamplesPerSec=6.326733080994904, CurrSamplesPerSec=5.672367276741743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:09:18,164] [INFO] [timer.py:197:stop] 0/2434, RunningAvgSamplesPerSec=6.326742221811101, CurrSamplesPerSec=5.707480991963551, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:09:29,558] [INFO] [timer.py:197:stop] 0/2436, RunningAvgSamplesPerSec=6.326750724338014, CurrSamplesPerSec=5.690528539573331, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:09:40,932] [INFO] [timer.py:197:stop] 0/2438, RunningAvgSamplesPerSec=6.326724669497559, CurrSamplesPerSec=5.690591751851742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:09:52,213] [INFO] [logging.py:68:log_dist] [Rank 0] step=1220, skipped=5, lr=[8.413333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 00:09:52,214] [INFO] [timer.py:197:stop] 0/2440, RunningAvgSamplesPerSec=6.326738501522213, CurrSamplesPerSec=5.702198068450731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:10:03,515] [INFO] [timer.py:197:stop] 0/2442, RunningAvgSamplesPerSec=6.326750841170004, CurrSamplesPerSec=5.698653885705367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:10:14,860] [INFO] [timer.py:197:stop] 0/2444, RunningAvgSamplesPerSec=6.326740054037859, CurrSamplesPerSec=5.718250201156127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:10:26,173] [INFO] [timer.py:197:stop] 0/2446, RunningAvgSamplesPerSec=6.3267458592431325, CurrSamplesPerSec=5.7044698818484925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:10:37,439] [INFO] [timer.py:197:stop] 0/2448, RunningAvgSamplesPerSec=6.326768357244894, CurrSamplesPerSec=5.727955490702307, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:10:48,891] [INFO] [timer.py:197:stop] 0/2450, RunningAvgSamplesPerSec=6.32674730298158, CurrSamplesPerSec=5.709237018720549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0075, 'learning_rate': 8.402222222222223e-06, 'epoch': 5.19} [2022-12-17 00:11:00,220] [INFO] [timer.py:197:stop] 0/2452, RunningAvgSamplesPerSec=6.326746408929464, CurrSamplesPerSec=5.675316745076753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:11:11,568] [INFO] [timer.py:197:stop] 0/2454, RunningAvgSamplesPerSec=6.326751113377327, CurrSamplesPerSec=5.6697918498087185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:11:22,921] [INFO] [timer.py:197:stop] 0/2456, RunningAvgSamplesPerSec=6.326768920915674, CurrSamplesPerSec=5.7107800478084885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:11:34,261] [INFO] [timer.py:197:stop] 0/2458, RunningAvgSamplesPerSec=6.326777253114961, CurrSamplesPerSec=5.693119946167675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:11:45,714] [INFO] [logging.py:68:log_dist] [Rank 0] step=1230, skipped=5, lr=[8.391111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 00:11:45,728] [INFO] [timer.py:197:stop] 0/2460, RunningAvgSamplesPerSec=6.326722606926644, CurrSamplesPerSec=5.568440852071338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:11:57,328] [INFO] [timer.py:197:stop] 0/2462, RunningAvgSamplesPerSec=6.326719706587058, CurrSamplesPerSec=5.700025616999359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:12:08,882] [INFO] [timer.py:197:stop] 0/2464, RunningAvgSamplesPerSec=6.326736640937212, CurrSamplesPerSec=5.725507396596043, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:12:20,388] [INFO] [timer.py:197:stop] 0/2466, RunningAvgSamplesPerSec=6.326662702925947, CurrSamplesPerSec=5.535536232496128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:12:31,932] [INFO] [timer.py:197:stop] 0/2468, RunningAvgSamplesPerSec=6.32667441348783, CurrSamplesPerSec=5.713611263948235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:12:43,315] [INFO] [timer.py:197:stop] 0/2470, RunningAvgSamplesPerSec=6.326702594844601, CurrSamplesPerSec=5.738342133171381, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:12:54,683] [INFO] [timer.py:197:stop] 0/2472, RunningAvgSamplesPerSec=6.3266968094008575, CurrSamplesPerSec=5.688689725337002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:13:06,234] [INFO] [timer.py:197:stop] 0/2474, RunningAvgSamplesPerSec=6.326702223524954, CurrSamplesPerSec=5.712893834869376, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:13:17,816] [INFO] [timer.py:197:stop] 0/2476, RunningAvgSamplesPerSec=6.326706690468993, CurrSamplesPerSec=5.693369169956182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:13:29,361] [INFO] [timer.py:197:stop] 0/2478, RunningAvgSamplesPerSec=6.326604120041126, CurrSamplesPerSec=5.5015015024862315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:13:40,781] [INFO] [logging.py:68:log_dist] [Rank 0] step=1240, skipped=5, lr=[8.36888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 00:13:40,783] [INFO] [timer.py:197:stop] 0/2480, RunningAvgSamplesPerSec=6.326613241704568, CurrSamplesPerSec=5.703245776278132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:13:52,225] [INFO] [timer.py:197:stop] 0/2482, RunningAvgSamplesPerSec=6.326629215160153, CurrSamplesPerSec=5.705985834359169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:14:03,577] [INFO] [timer.py:197:stop] 0/2484, RunningAvgSamplesPerSec=6.3266315522699, CurrSamplesPerSec=5.684307800779385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:14:15,106] [INFO] [timer.py:197:stop] 0/2486, RunningAvgSamplesPerSec=6.326633090727117, CurrSamplesPerSec=5.702784146178658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:14:26,547] [INFO] [timer.py:197:stop] 0/2488, RunningAvgSamplesPerSec=6.326633770679098, CurrSamplesPerSec=5.690413940741107, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:14:37,978] [INFO] [timer.py:197:stop] 0/2490, RunningAvgSamplesPerSec=6.326596239594797, CurrSamplesPerSec=5.607375858672685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:14:49,487] [INFO] [timer.py:197:stop] 0/2492, RunningAvgSamplesPerSec=6.326614834209527, CurrSamplesPerSec=5.7160601480076645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:15:00,811] [INFO] [timer.py:197:stop] 0/2494, RunningAvgSamplesPerSec=6.326637863759786, CurrSamplesPerSec=5.725856193596088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:15:12,347] [INFO] [timer.py:197:stop] 0/2496, RunningAvgSamplesPerSec=6.3265480474100535, CurrSamplesPerSec=5.50056176595693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:15:23,670] [INFO] [timer.py:197:stop] 0/2498, RunningAvgSamplesPerSec=6.326564531801721, CurrSamplesPerSec=5.718744552837693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:15:35,029] [INFO] [logging.py:68:log_dist] [Rank 0] step=1250, skipped=5, lr=[8.346666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 00:15:35,030] [INFO] [timer.py:197:stop] 0/2500, RunningAvgSamplesPerSec=6.326555577673725, CurrSamplesPerSec=5.695495462388746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0063, 'learning_rate': 8.346666666666668e-06, 'epoch': 5.3} [2022-12-17 00:15:46,532] [INFO] [timer.py:197:stop] 0/2502, RunningAvgSamplesPerSec=6.326484923630128, CurrSamplesPerSec=5.53508263380697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:15:57,765] [INFO] [timer.py:197:stop] 0/2504, RunningAvgSamplesPerSec=6.3265099226472925, CurrSamplesPerSec=5.736485042851482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:16:09,067] [INFO] [timer.py:197:stop] 0/2506, RunningAvgSamplesPerSec=6.326523949412188, CurrSamplesPerSec=5.719076686267754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:16:20,454] [INFO] [timer.py:197:stop] 0/2508, RunningAvgSamplesPerSec=6.326533116804953, CurrSamplesPerSec=5.689745740535148, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:16:31,727] [INFO] [timer.py:197:stop] 0/2510, RunningAvgSamplesPerSec=6.326552905082487, CurrSamplesPerSec=5.716908159224934, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:16:43,073] [INFO] [timer.py:197:stop] 0/2512, RunningAvgSamplesPerSec=6.326582937970287, CurrSamplesPerSec=5.723374992606862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:16:54,613] [INFO] [timer.py:197:stop] 0/2514, RunningAvgSamplesPerSec=6.326550641893863, CurrSamplesPerSec=5.66221191290154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:17:05,946] [INFO] [timer.py:197:stop] 0/2516, RunningAvgSamplesPerSec=6.326549027844262, CurrSamplesPerSec=5.673992379000022, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:17:17,284] [INFO] [timer.py:197:stop] 0/2518, RunningAvgSamplesPerSec=6.326558787750363, CurrSamplesPerSec=5.698325087774245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:17:28,836] [INFO] [logging.py:68:log_dist] [Rank 0] step=1260, skipped=5, lr=[8.324444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 00:17:28,838] [INFO] [timer.py:197:stop] 0/2520, RunningAvgSamplesPerSec=6.326540122489599, CurrSamplesPerSec=5.690654482982018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:17:40,139] [INFO] [timer.py:197:stop] 0/2522, RunningAvgSamplesPerSec=6.326553458397724, CurrSamplesPerSec=5.701011744954114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:17:51,655] [INFO] [timer.py:197:stop] 0/2524, RunningAvgSamplesPerSec=6.3265404713512865, CurrSamplesPerSec=5.667898194852396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:03,195] [INFO] [timer.py:197:stop] 0/2526, RunningAvgSamplesPerSec=6.326507994869215, CurrSamplesPerSec=5.68231398309392, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:14,498] [INFO] [timer.py:197:stop] 0/2528, RunningAvgSamplesPerSec=6.326522367660754, CurrSamplesPerSec=5.688211403993173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:25,817] [INFO] [timer.py:197:stop] 0/2530, RunningAvgSamplesPerSec=6.32653360591826, CurrSamplesPerSec=5.712521815482252, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:37,184] [INFO] [timer.py:197:stop] 0/2532, RunningAvgSamplesPerSec=6.326514255985787, CurrSamplesPerSec=5.717885522347397, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:48,479] [INFO] [timer.py:197:stop] 0/2534, RunningAvgSamplesPerSec=6.3265339901958315, CurrSamplesPerSec=5.713119501609359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:18:59,747] [INFO] [timer.py:197:stop] 0/2536, RunningAvgSamplesPerSec=6.326563020917872, CurrSamplesPerSec=5.7426670009695355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:19:11,112] [INFO] [timer.py:197:stop] 0/2538, RunningAvgSamplesPerSec=6.326545038393981, CurrSamplesPerSec=5.721956142352332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:19:22,447] [INFO] [logging.py:68:log_dist] [Rank 0] step=1270, skipped=5, lr=[8.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 00:19:22,449] [INFO] [timer.py:197:stop] 0/2540, RunningAvgSamplesPerSec=6.326540286848705, CurrSamplesPerSec=5.686737648285296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:19:33,717] [INFO] [timer.py:197:stop] 0/2542, RunningAvgSamplesPerSec=6.326562698910825, CurrSamplesPerSec=5.721906135533029, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:19:45,369] [INFO] [timer.py:197:stop] 0/2544, RunningAvgSamplesPerSec=6.326403229462973, CurrSamplesPerSec=5.661712001273591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:19:56,656] [INFO] [timer.py:197:stop] 0/2546, RunningAvgSamplesPerSec=6.32642389843164, CurrSamplesPerSec=5.701724981548307, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:20:07,953] [INFO] [timer.py:197:stop] 0/2548, RunningAvgSamplesPerSec=6.326438972781323, CurrSamplesPerSec=5.698616866824313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:20:19,484] [INFO] [timer.py:197:stop] 0/2550, RunningAvgSamplesPerSec=6.326450768126824, CurrSamplesPerSec=5.7128675730879115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0063, 'learning_rate': 8.291111111111112e-06, 'epoch': 5.4} [2022-12-17 00:20:30,772] [INFO] [timer.py:197:stop] 0/2552, RunningAvgSamplesPerSec=6.3264719429827245, CurrSamplesPerSec=5.7136346138201715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:20:42,057] [INFO] [timer.py:197:stop] 0/2554, RunningAvgSamplesPerSec=6.326493149760229, CurrSamplesPerSec=5.718664875793541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:20:53,716] [INFO] [timer.py:197:stop] 0/2556, RunningAvgSamplesPerSec=6.326505639720777, CurrSamplesPerSec=5.708368460113684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:21:05,123] [INFO] [timer.py:197:stop] 0/2558, RunningAvgSamplesPerSec=6.3265038209555655, CurrSamplesPerSec=5.665879514564959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:21:16,744] [INFO] [logging.py:68:log_dist] [Rank 0] step=1280, skipped=5, lr=[8.28e-06], mom=[[0.9, 0.999]] [2022-12-17 00:21:16,752] [INFO] [timer.py:197:stop] 0/2560, RunningAvgSamplesPerSec=6.326358618419738, CurrSamplesPerSec=5.413214403540145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:21:28,250] [INFO] [timer.py:197:stop] 0/2562, RunningAvgSamplesPerSec=6.326359770774308, CurrSamplesPerSec=5.694195483572104, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:21:39,600] [INFO] [timer.py:197:stop] 0/2564, RunningAvgSamplesPerSec=6.326349992478936, CurrSamplesPerSec=5.6756863343254045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:21:50,969] [INFO] [timer.py:197:stop] 0/2566, RunningAvgSamplesPerSec=6.326319567798299, CurrSamplesPerSec=5.643538377656659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:02,666] [INFO] [timer.py:197:stop] 0/2568, RunningAvgSamplesPerSec=6.326323086683221, CurrSamplesPerSec=5.697887233328356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:14,206] [INFO] [timer.py:197:stop] 0/2570, RunningAvgSamplesPerSec=6.326330355222502, CurrSamplesPerSec=5.7042182307129705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:25,738] [INFO] [timer.py:197:stop] 0/2572, RunningAvgSamplesPerSec=6.326225319823644, CurrSamplesPerSec=5.473016049069588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:37,126] [INFO] [timer.py:197:stop] 0/2574, RunningAvgSamplesPerSec=6.326225277051989, CurrSamplesPerSec=5.6848013572597695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:48,440] [INFO] [timer.py:197:stop] 0/2576, RunningAvgSamplesPerSec=6.326231527655515, CurrSamplesPerSec=5.674560437142187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:22:59,916] [INFO] [timer.py:197:stop] 0/2578, RunningAvgSamplesPerSec=6.326152255092386, CurrSamplesPerSec=5.522841235953951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:23:11,199] [INFO] [logging.py:68:log_dist] [Rank 0] step=1290, skipped=5, lr=[8.25777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 00:23:11,201] [INFO] [timer.py:197:stop] 0/2580, RunningAvgSamplesPerSec=6.326175072075225, CurrSamplesPerSec=5.720998846531288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:23:22,498] [INFO] [timer.py:197:stop] 0/2582, RunningAvgSamplesPerSec=6.326183771322497, CurrSamplesPerSec=5.698068414658625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:23:33,927] [INFO] [timer.py:197:stop] 0/2584, RunningAvgSamplesPerSec=6.326139785337271, CurrSamplesPerSec=5.58557952242445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:23:45,167] [INFO] [timer.py:197:stop] 0/2586, RunningAvgSamplesPerSec=6.326179241497298, CurrSamplesPerSec=5.7387481945854235, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:23:56,451] [INFO] [timer.py:197:stop] 0/2588, RunningAvgSamplesPerSec=6.32620099438922, CurrSamplesPerSec=5.721770755603695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:24:07,781] [INFO] [timer.py:197:stop] 0/2590, RunningAvgSamplesPerSec=6.326192997876933, CurrSamplesPerSec=5.647460068432117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:24:19,159] [INFO] [timer.py:197:stop] 0/2592, RunningAvgSamplesPerSec=6.326204338875871, CurrSamplesPerSec=5.710935805986585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:24:30,481] [INFO] [timer.py:197:stop] 0/2594, RunningAvgSamplesPerSec=6.326213224289392, CurrSamplesPerSec=5.6991071035421985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:24:41,792] [INFO] [timer.py:197:stop] 0/2596, RunningAvgSamplesPerSec=6.326214609800813, CurrSamplesPerSec=5.689834744426627, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:24:53,083] [INFO] [timer.py:197:stop] 0/2598, RunningAvgSamplesPerSec=6.3262259837086505, CurrSamplesPerSec=5.7008393355908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:25:04,381] [INFO] [logging.py:68:log_dist] [Rank 0] step=1300, skipped=5, lr=[8.235555555555557e-06], mom=[[0.9, 0.999]] [2022-12-17 00:25:04,383] [INFO] [timer.py:197:stop] 0/2600, RunningAvgSamplesPerSec=6.3262398954655445, CurrSamplesPerSec=5.711441774946321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0057, 'learning_rate': 8.235555555555557e-06, 'epoch': 5.51} [2022-12-17 00:25:15,742] [INFO] [timer.py:197:stop] 0/2602, RunningAvgSamplesPerSec=6.326244054453919, CurrSamplesPerSec=5.675839944017053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:25:27,029] [INFO] [timer.py:197:stop] 0/2604, RunningAvgSamplesPerSec=6.326264010208493, CurrSamplesPerSec=5.715521719669567, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:25:38,582] [INFO] [timer.py:197:stop] 0/2606, RunningAvgSamplesPerSec=6.326260293048078, CurrSamplesPerSec=5.690133374032339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:25:50,225] [INFO] [timer.py:197:stop] 0/2608, RunningAvgSamplesPerSec=6.3262267886673405, CurrSamplesPerSec=5.683333218298441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:01,512] [INFO] [timer.py:197:stop] 0/2610, RunningAvgSamplesPerSec=6.326239613463648, CurrSamplesPerSec=5.710197913057936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:12,880] [INFO] [timer.py:197:stop] 0/2612, RunningAvgSamplesPerSec=6.32625914149536, CurrSamplesPerSec=5.713643370071351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:24,506] [INFO] [timer.py:197:stop] 0/2614, RunningAvgSamplesPerSec=6.32611627280062, CurrSamplesPerSec=5.696622671515938, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:35,814] [INFO] [timer.py:197:stop] 0/2616, RunningAvgSamplesPerSec=6.326125500638407, CurrSamplesPerSec=5.70426065588973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:47,096] [INFO] [timer.py:197:stop] 0/2618, RunningAvgSamplesPerSec=6.326139626599832, CurrSamplesPerSec=5.702213815140462, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:26:58,708] [INFO] [logging.py:68:log_dist] [Rank 0] step=1310, skipped=5, lr=[8.213333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 00:26:58,710] [INFO] [timer.py:197:stop] 0/2620, RunningAvgSamplesPerSec=6.326002105283931, CurrSamplesPerSec=5.712053091414867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:27:10,158] [INFO] [timer.py:197:stop] 0/2622, RunningAvgSamplesPerSec=6.3259629171482885, CurrSamplesPerSec=5.592408362554529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:27:21,571] [INFO] [timer.py:197:stop] 0/2624, RunningAvgSamplesPerSec=6.325924896719555, CurrSamplesPerSec=5.606229828905596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:27:33,009] [INFO] [timer.py:197:stop] 0/2626, RunningAvgSamplesPerSec=6.325937791742814, CurrSamplesPerSec=5.702413926803422, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:27:44,296] [INFO] [timer.py:197:stop] 0/2628, RunningAvgSamplesPerSec=6.325958003199458, CurrSamplesPerSec=5.73155556162893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:27:55,576] [INFO] [timer.py:197:stop] 0/2630, RunningAvgSamplesPerSec=6.325981450593395, CurrSamplesPerSec=5.719802494288624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:28:06,852] [INFO] [timer.py:197:stop] 0/2632, RunningAvgSamplesPerSec=6.326006891574107, CurrSamplesPerSec=5.729226910935692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:28:18,152] [INFO] [timer.py:197:stop] 0/2634, RunningAvgSamplesPerSec=6.326020555896606, CurrSamplesPerSec=5.698490086849737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:28:29,411] [INFO] [timer.py:197:stop] 0/2636, RunningAvgSamplesPerSec=6.326053117175442, CurrSamplesPerSec=5.732383449386664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:28:40,700] [INFO] [timer.py:197:stop] 0/2638, RunningAvgSamplesPerSec=6.32607484050823, CurrSamplesPerSec=5.717845086546815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:28:51,958] [INFO] [logging.py:68:log_dist] [Rank 0] step=1320, skipped=5, lr=[8.191111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 00:28:51,959] [INFO] [timer.py:197:stop] 0/2640, RunningAvgSamplesPerSec=6.326099519405758, CurrSamplesPerSec=5.725360855912853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:29:03,227] [INFO] [timer.py:197:stop] 0/2642, RunningAvgSamplesPerSec=6.326120596379347, CurrSamplesPerSec=5.725664691942976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:29:14,824] [INFO] [timer.py:197:stop] 0/2644, RunningAvgSamplesPerSec=6.326140929669695, CurrSamplesPerSec=5.704167563757328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:29:26,095] [INFO] [timer.py:197:stop] 0/2646, RunningAvgSamplesPerSec=6.32616732833622, CurrSamplesPerSec=5.720641863301964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:29:37,643] [INFO] [timer.py:197:stop] 0/2648, RunningAvgSamplesPerSec=6.326194882392596, CurrSamplesPerSec=5.723545594812338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:29:48,979] [INFO] [timer.py:197:stop] 0/2650, RunningAvgSamplesPerSec=6.326209225498618, CurrSamplesPerSec=5.724713724280205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0058, 'learning_rate': 8.18e-06, 'epoch': 5.61} [2022-12-17 00:30:00,270] [INFO] [timer.py:197:stop] 0/2652, RunningAvgSamplesPerSec=6.326229768870487, CurrSamplesPerSec=5.710958890994876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:30:11,729] [INFO] [timer.py:197:stop] 0/2654, RunningAvgSamplesPerSec=6.326249240897963, CurrSamplesPerSec=5.708205558930267, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:30:23,103] [INFO] [timer.py:197:stop] 0/2656, RunningAvgSamplesPerSec=6.326252214933294, CurrSamplesPerSec=5.699375244988747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:30:34,432] [INFO] [timer.py:197:stop] 0/2658, RunningAvgSamplesPerSec=6.326251581706483, CurrSamplesPerSec=5.664131887462741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:30:45,723] [INFO] [logging.py:68:log_dist] [Rank 0] step=1330, skipped=5, lr=[8.16888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 00:30:45,725] [INFO] [timer.py:197:stop] 0/2660, RunningAvgSamplesPerSec=6.326267794800439, CurrSamplesPerSec=5.690308755138885, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:30:57,318] [INFO] [timer.py:197:stop] 0/2662, RunningAvgSamplesPerSec=6.326142217190136, CurrSamplesPerSec=5.70153291081011, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:31:08,612] [INFO] [timer.py:197:stop] 0/2664, RunningAvgSamplesPerSec=6.326157916158148, CurrSamplesPerSec=5.708166959267274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:31:19,918] [INFO] [timer.py:197:stop] 0/2666, RunningAvgSamplesPerSec=6.3261677520262936, CurrSamplesPerSec=5.702544273430368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:31:31,445] [INFO] [timer.py:197:stop] 0/2668, RunningAvgSamplesPerSec=6.326075930517208, CurrSamplesPerSec=5.737835066469297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:31:42,765] [INFO] [timer.py:197:stop] 0/2670, RunningAvgSamplesPerSec=6.326079650124289, CurrSamplesPerSec=5.704554255407273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:31:54,072] [INFO] [timer.py:197:stop] 0/2672, RunningAvgSamplesPerSec=6.326084022141423, CurrSamplesPerSec=5.679570891042092, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:32:05,392] [INFO] [timer.py:197:stop] 0/2674, RunningAvgSamplesPerSec=6.32608739737266, CurrSamplesPerSec=5.69966592139152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:32:16,734] [INFO] [timer.py:197:stop] 0/2676, RunningAvgSamplesPerSec=6.3260808079396345, CurrSamplesPerSec=5.668778662460667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:32:28,067] [INFO] [timer.py:197:stop] 0/2678, RunningAvgSamplesPerSec=6.326079861892401, CurrSamplesPerSec=5.675158364418903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:32:39,429] [INFO] [logging.py:68:log_dist] [Rank 0] step=1340, skipped=5, lr=[8.146666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 00:32:39,431] [INFO] [timer.py:197:stop] 0/2680, RunningAvgSamplesPerSec=6.32606307682349, CurrSamplesPerSec=5.7083980795096085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:32:50,725] [INFO] [timer.py:197:stop] 0/2682, RunningAvgSamplesPerSec=6.326072935799696, CurrSamplesPerSec=5.679356998906212, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:02,471] [INFO] [timer.py:197:stop] 0/2684, RunningAvgSamplesPerSec=6.325883959083556, CurrSamplesPerSec=5.299005576723335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:13,813] [INFO] [timer.py:197:stop] 0/2686, RunningAvgSamplesPerSec=6.325889876937557, CurrSamplesPerSec=5.705633390227677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:25,326] [INFO] [timer.py:197:stop] 0/2688, RunningAvgSamplesPerSec=6.3258965851993825, CurrSamplesPerSec=5.709519715426119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:36,701] [INFO] [timer.py:197:stop] 0/2690, RunningAvgSamplesPerSec=6.325876082499624, CurrSamplesPerSec=5.64687770381663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:48,182] [INFO] [timer.py:197:stop] 0/2692, RunningAvgSamplesPerSec=6.325893590208427, CurrSamplesPerSec=5.724229081879015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:33:59,565] [INFO] [timer.py:197:stop] 0/2694, RunningAvgSamplesPerSec=6.325900009531055, CurrSamplesPerSec=5.684401208796394, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:34:10,944] [INFO] [timer.py:197:stop] 0/2696, RunningAvgSamplesPerSec=6.325868582660771, CurrSamplesPerSec=5.625398530103999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:34:22,205] [INFO] [timer.py:197:stop] 0/2698, RunningAvgSamplesPerSec=6.32588495236265, CurrSamplesPerSec=5.71918854325984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:34:33,514] [INFO] [logging.py:68:log_dist] [Rank 0] step=1350, skipped=5, lr=[8.124444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 00:34:33,516] [INFO] [timer.py:197:stop] 0/2700, RunningAvgSamplesPerSec=6.325898947710833, CurrSamplesPerSec=5.706381991812721, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0065, 'learning_rate': 8.124444444444445e-06, 'epoch': 5.72} [2022-12-17 00:34:44,835] [INFO] [timer.py:197:stop] 0/2702, RunningAvgSamplesPerSec=6.325905253726988, CurrSamplesPerSec=5.686284468334417, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:34:56,149] [INFO] [timer.py:197:stop] 0/2704, RunningAvgSamplesPerSec=6.325912620762956, CurrSamplesPerSec=5.71820001547374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:35:07,474] [INFO] [timer.py:197:stop] 0/2706, RunningAvgSamplesPerSec=6.325917059353624, CurrSamplesPerSec=5.686473585688181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:35:18,765] [INFO] [timer.py:197:stop] 0/2708, RunningAvgSamplesPerSec=6.325928300743048, CurrSamplesPerSec=5.692940528208803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:35:30,079] [INFO] [timer.py:197:stop] 0/2710, RunningAvgSamplesPerSec=6.325935598815497, CurrSamplesPerSec=5.698778737242706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:35:41,360] [INFO] [timer.py:197:stop] 0/2712, RunningAvgSamplesPerSec=6.325953492621701, CurrSamplesPerSec=5.725917017592284, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:35:52,664] [INFO] [timer.py:197:stop] 0/2714, RunningAvgSamplesPerSec=6.3259665608292295, CurrSamplesPerSec=5.7136144258988875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:36:03,998] [INFO] [timer.py:197:stop] 0/2716, RunningAvgSamplesPerSec=6.325965339171459, CurrSamplesPerSec=5.69463711984461, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:36:15,336] [INFO] [timer.py:197:stop] 0/2718, RunningAvgSamplesPerSec=6.325963267000056, CurrSamplesPerSec=5.6800157902202875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:36:26,643] [INFO] [logging.py:68:log_dist] [Rank 0] step=1360, skipped=5, lr=[8.102222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 00:36:26,644] [INFO] [timer.py:197:stop] 0/2720, RunningAvgSamplesPerSec=6.325974389268866, CurrSamplesPerSec=5.712358434045041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:36:38,000] [INFO] [timer.py:197:stop] 0/2722, RunningAvgSamplesPerSec=6.325966469616858, CurrSamplesPerSec=5.687886943472245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:36:49,304] [INFO] [timer.py:197:stop] 0/2724, RunningAvgSamplesPerSec=6.3259805196125285, CurrSamplesPerSec=5.710168274981378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:00,609] [INFO] [timer.py:197:stop] 0/2726, RunningAvgSamplesPerSec=6.325991601930382, CurrSamplesPerSec=5.717981011568095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:11,924] [INFO] [timer.py:197:stop] 0/2728, RunningAvgSamplesPerSec=6.325997268890384, CurrSamplesPerSec=5.687382247117594, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:23,234] [INFO] [timer.py:197:stop] 0/2730, RunningAvgSamplesPerSec=6.326005740177389, CurrSamplesPerSec=5.691548793241471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:34,542] [INFO] [timer.py:197:stop] 0/2732, RunningAvgSamplesPerSec=6.326015268521907, CurrSamplesPerSec=5.710145682228034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:45,849] [INFO] [timer.py:197:stop] 0/2734, RunningAvgSamplesPerSec=6.326026052423808, CurrSamplesPerSec=5.69643166967437, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:37:57,147] [INFO] [timer.py:197:stop] 0/2736, RunningAvgSamplesPerSec=6.326034238434523, CurrSamplesPerSec=5.711207005865852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:38:08,454] [INFO] [timer.py:197:stop] 0/2738, RunningAvgSamplesPerSec=6.326044722066875, CurrSamplesPerSec=5.714813057261718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:38:19,744] [INFO] [logging.py:68:log_dist] [Rank 0] step=1370, skipped=5, lr=[8.08e-06], mom=[[0.9, 0.999]] [2022-12-17 00:38:19,746] [INFO] [timer.py:197:stop] 0/2740, RunningAvgSamplesPerSec=6.326062588023212, CurrSamplesPerSec=5.714517427614674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:38:31,062] [INFO] [timer.py:197:stop] 0/2742, RunningAvgSamplesPerSec=6.326068755109588, CurrSamplesPerSec=5.6926896518027075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:38:42,377] [INFO] [timer.py:197:stop] 0/2744, RunningAvgSamplesPerSec=6.326077004074519, CurrSamplesPerSec=5.698056319417846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:38:53,733] [INFO] [timer.py:197:stop] 0/2746, RunningAvgSamplesPerSec=6.326095865876457, CurrSamplesPerSec=5.708167930321577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:39:05,083] [INFO] [timer.py:197:stop] 0/2748, RunningAvgSamplesPerSec=6.326088331147206, CurrSamplesPerSec=5.6702160544353335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:39:16,406] [INFO] [timer.py:197:stop] 0/2750, RunningAvgSamplesPerSec=6.32609210532745, CurrSamplesPerSec=5.6996581760903995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0059, 'learning_rate': 8.06888888888889e-06, 'epoch': 5.83} [2022-12-17 00:39:27,683] [INFO] [timer.py:197:stop] 0/2752, RunningAvgSamplesPerSec=6.326102930538578, CurrSamplesPerSec=5.700973726824695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:39:39,245] [INFO] [timer.py:197:stop] 0/2754, RunningAvgSamplesPerSec=6.326103142679591, CurrSamplesPerSec=5.6921966559580985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:39:50,546] [INFO] [timer.py:197:stop] 0/2756, RunningAvgSamplesPerSec=6.326109758767559, CurrSamplesPerSec=5.68873505431431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:01,928] [INFO] [timer.py:197:stop] 0/2758, RunningAvgSamplesPerSec=6.326152357749966, CurrSamplesPerSec=5.717524543449726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:13,229] [INFO] [logging.py:68:log_dist] [Rank 0] step=1380, skipped=5, lr=[8.057777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 00:40:13,231] [INFO] [timer.py:197:stop] 0/2760, RunningAvgSamplesPerSec=6.3261563606599145, CurrSamplesPerSec=5.686595494752139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:24,553] [INFO] [timer.py:197:stop] 0/2762, RunningAvgSamplesPerSec=6.326160416652262, CurrSamplesPerSec=5.694735458470453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:35,862] [INFO] [timer.py:197:stop] 0/2764, RunningAvgSamplesPerSec=6.326171019397204, CurrSamplesPerSec=5.696176617567328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:47,152] [INFO] [timer.py:197:stop] 0/2766, RunningAvgSamplesPerSec=6.326189476157271, CurrSamplesPerSec=5.714541758117771, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:40:58,431] [INFO] [timer.py:197:stop] 0/2768, RunningAvgSamplesPerSec=6.3262123829027574, CurrSamplesPerSec=5.732181963257296, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:41:09,717] [INFO] [timer.py:197:stop] 0/2770, RunningAvgSamplesPerSec=6.3262342581050754, CurrSamplesPerSec=5.734106085065293, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:41:21,023] [INFO] [timer.py:197:stop] 0/2772, RunningAvgSamplesPerSec=6.326247134308609, CurrSamplesPerSec=5.71070205041586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:41:32,419] [INFO] [timer.py:197:stop] 0/2774, RunningAvgSamplesPerSec=6.326217747933542, CurrSamplesPerSec=5.609474965653818, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:41:43,694] [INFO] [timer.py:197:stop] 0/2776, RunningAvgSamplesPerSec=6.326237779781224, CurrSamplesPerSec=5.714292623567177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:41:54,989] [INFO] [timer.py:197:stop] 0/2778, RunningAvgSamplesPerSec=6.326246832178511, CurrSamplesPerSec=5.694157073152024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:42:06,306] [INFO] [logging.py:68:log_dist] [Rank 0] step=1390, skipped=5, lr=[8.035555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 00:42:06,307] [INFO] [timer.py:197:stop] 0/2780, RunningAvgSamplesPerSec=6.326252131073057, CurrSamplesPerSec=5.698359441632837, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:42:17,627] [INFO] [timer.py:197:stop] 0/2782, RunningAvgSamplesPerSec=6.326263541773177, CurrSamplesPerSec=5.712742832923151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:42:28,915] [INFO] [timer.py:197:stop] 0/2784, RunningAvgSamplesPerSec=6.3262820242871625, CurrSamplesPerSec=5.711086226394405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:42:40,235] [INFO] [timer.py:197:stop] 0/2786, RunningAvgSamplesPerSec=6.326278470128036, CurrSamplesPerSec=5.689995633865518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:42:51,531] [INFO] [timer.py:197:stop] 0/2788, RunningAvgSamplesPerSec=6.3262812600351, CurrSamplesPerSec=5.698626544916717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:02,861] [INFO] [timer.py:197:stop] 0/2790, RunningAvgSamplesPerSec=6.326284751834609, CurrSamplesPerSec=5.682586802008966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:14,152] [INFO] [timer.py:197:stop] 0/2792, RunningAvgSamplesPerSec=6.326300456502428, CurrSamplesPerSec=5.711068486580379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:25,489] [INFO] [timer.py:197:stop] 0/2794, RunningAvgSamplesPerSec=6.326296045859089, CurrSamplesPerSec=5.68929401032924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:36,844] [INFO] [timer.py:197:stop] 0/2796, RunningAvgSamplesPerSec=6.326283444929573, CurrSamplesPerSec=5.678466275380915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:48,194] [INFO] [timer.py:197:stop] 0/2798, RunningAvgSamplesPerSec=6.326275113134897, CurrSamplesPerSec=5.681810996789058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:43:59,488] [INFO] [logging.py:68:log_dist] [Rank 0] step=1400, skipped=5, lr=[8.013333333333333e-06], mom=[[0.9, 0.999]] [2022-12-17 00:43:59,490] [INFO] [timer.py:197:stop] 0/2800, RunningAvgSamplesPerSec=6.326278524053802, CurrSamplesPerSec=5.708140012642158, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0061, 'learning_rate': 8.013333333333333e-06, 'epoch': 5.93} [2022-12-17 00:44:10,818] [INFO] [timer.py:197:stop] 0/2802, RunningAvgSamplesPerSec=6.326272564841731, CurrSamplesPerSec=5.689947872632715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:44:22,144] [INFO] [timer.py:197:stop] 0/2804, RunningAvgSamplesPerSec=6.3262738518884944, CurrSamplesPerSec=5.6964355379390215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:44:33,451] [INFO] [timer.py:197:stop] 0/2806, RunningAvgSamplesPerSec=6.326284669686423, CurrSamplesPerSec=5.71170111271115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:44:44,774] [INFO] [timer.py:197:stop] 0/2808, RunningAvgSamplesPerSec=6.326287380248273, CurrSamplesPerSec=5.694567777450047, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:44:56,077] [INFO] [timer.py:197:stop] 0/2810, RunningAvgSamplesPerSec=6.326298963116309, CurrSamplesPerSec=5.7054769510047585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:45:07,384] [INFO] [timer.py:197:stop] 0/2812, RunningAvgSamplesPerSec=6.326309438125834, CurrSamplesPerSec=5.707904058468435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:45:18,694] [INFO] [timer.py:197:stop] 0/2814, RunningAvgSamplesPerSec=6.326318695543646, CurrSamplesPerSec=5.699163488592521, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:45:30,016] [INFO] [timer.py:197:stop] 0/2816, RunningAvgSamplesPerSec=6.326308895338787, CurrSamplesPerSec=5.6812999230670735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:45:41,403] [INFO] [timer.py:197:stop] 0/2818, RunningAvgSamplesPerSec=6.3263158441663165, CurrSamplesPerSec=5.697318123135298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:45:52,683] [INFO] [logging.py:68:log_dist] [Rank 0] step=1410, skipped=5, lr=[7.991111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 00:45:52,684] [INFO] [timer.py:197:stop] 0/2820, RunningAvgSamplesPerSec=6.326330930150586, CurrSamplesPerSec=5.711552361272398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:04,105] [INFO] [timer.py:197:stop] 0/2822, RunningAvgSamplesPerSec=6.326337358566065, CurrSamplesPerSec=5.7063317716262025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:15,409] [INFO] [timer.py:197:stop] 0/2824, RunningAvgSamplesPerSec=6.326350499991001, CurrSamplesPerSec=5.700049824256106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:26,712] [INFO] [timer.py:197:stop] 0/2826, RunningAvgSamplesPerSec=6.326362116471989, CurrSamplesPerSec=5.696192331017952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:37,981] [INFO] [timer.py:197:stop] 0/2828, RunningAvgSamplesPerSec=6.326378782489265, CurrSamplesPerSec=5.72583909471339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:49,286] [INFO] [timer.py:197:stop] 0/2830, RunningAvgSamplesPerSec=6.3263905912443725, CurrSamplesPerSec=5.712548803494018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:46:57,752] [INFO] [timer.py:197:stop] 0/2832, RunningAvgSamplesPerSec=6.327500906774722, CurrSamplesPerSec=10.213131621964372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:47:09,028] [INFO] [timer.py:197:stop] 0/2834, RunningAvgSamplesPerSec=6.327524589867564, CurrSamplesPerSec=5.7415259191163495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:47:20,330] [INFO] [timer.py:197:stop] 0/2836, RunningAvgSamplesPerSec=6.32753757473291, CurrSamplesPerSec=5.706063945559606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:47:31,620] [INFO] [timer.py:197:stop] 0/2838, RunningAvgSamplesPerSec=6.32755530949647, CurrSamplesPerSec=5.708585999887545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:47:42,882] [INFO] [logging.py:68:log_dist] [Rank 0] step=1420, skipped=5, lr=[7.968888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 00:47:42,884] [INFO] [timer.py:197:stop] 0/2840, RunningAvgSamplesPerSec=6.3275774644451, CurrSamplesPerSec=5.717691143431064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:47:54,164] [INFO] [timer.py:197:stop] 0/2842, RunningAvgSamplesPerSec=6.327600320149804, CurrSamplesPerSec=5.730524830517137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:48:05,413] [INFO] [timer.py:197:stop] 0/2844, RunningAvgSamplesPerSec=6.327630379247216, CurrSamplesPerSec=5.741550480149671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:48:16,683] [INFO] [timer.py:197:stop] 0/2846, RunningAvgSamplesPerSec=6.327656478388394, CurrSamplesPerSec=5.728110231499037, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:48:27,968] [INFO] [timer.py:197:stop] 0/2848, RunningAvgSamplesPerSec=6.327676846505894, CurrSamplesPerSec=5.711667570043478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:48:39,248] [INFO] [timer.py:197:stop] 0/2850, RunningAvgSamplesPerSec=6.327698323490243, CurrSamplesPerSec=5.715694531454376, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0069, 'learning_rate': 7.957777777777779e-06, 'epoch': 6.04} [2022-12-17 00:48:50,688] [INFO] [timer.py:197:stop] 0/2852, RunningAvgSamplesPerSec=6.327681825148393, CurrSamplesPerSec=5.633257481872813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:01,947] [INFO] [timer.py:197:stop] 0/2854, RunningAvgSamplesPerSec=6.327705434093166, CurrSamplesPerSec=5.713875177257864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:13,201] [INFO] [timer.py:197:stop] 0/2856, RunningAvgSamplesPerSec=6.327737753470916, CurrSamplesPerSec=5.747293209022641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:24,470] [INFO] [timer.py:197:stop] 0/2858, RunningAvgSamplesPerSec=6.327763769334866, CurrSamplesPerSec=5.725460258524562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:35,694] [INFO] [logging.py:68:log_dist] [Rank 0] step=1430, skipped=5, lr=[7.946666666666666e-06], mom=[[0.9, 0.999]] [2022-12-17 00:49:35,696] [INFO] [timer.py:197:stop] 0/2860, RunningAvgSamplesPerSec=6.327800843423795, CurrSamplesPerSec=5.735983206986001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:46,948] [INFO] [timer.py:197:stop] 0/2862, RunningAvgSamplesPerSec=6.327833495419585, CurrSamplesPerSec=5.748474746050126, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:49:58,260] [INFO] [timer.py:197:stop] 0/2864, RunningAvgSamplesPerSec=6.327840557961029, CurrSamplesPerSec=5.688244430653757, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:50:09,571] [INFO] [timer.py:197:stop] 0/2866, RunningAvgSamplesPerSec=6.327847886071225, CurrSamplesPerSec=5.6982658162456055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:50:20,879] [INFO] [timer.py:197:stop] 0/2868, RunningAvgSamplesPerSec=6.327856789431566, CurrSamplesPerSec=5.690743032739117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:50:32,132] [INFO] [timer.py:197:stop] 0/2870, RunningAvgSamplesPerSec=6.327882540236703, CurrSamplesPerSec=5.7294487337776525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:50:43,407] [INFO] [timer.py:197:stop] 0/2872, RunningAvgSamplesPerSec=6.327907951304773, CurrSamplesPerSec=5.717148998402179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:50:54,684] [INFO] [timer.py:197:stop] 0/2874, RunningAvgSamplesPerSec=6.327916338895959, CurrSamplesPerSec=5.7034958874174935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:51:05,993] [INFO] [timer.py:197:stop] 0/2876, RunningAvgSamplesPerSec=6.327923831231903, CurrSamplesPerSec=5.691680333164429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:51:17,290] [INFO] [timer.py:197:stop] 0/2878, RunningAvgSamplesPerSec=6.32793017975812, CurrSamplesPerSec=5.69989199664201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:51:28,548] [INFO] [logging.py:68:log_dist] [Rank 0] step=1440, skipped=5, lr=[7.924444444444444e-06], mom=[[0.9, 0.999]] [2022-12-17 00:51:28,549] [INFO] [timer.py:197:stop] 0/2880, RunningAvgSamplesPerSec=6.327947443597498, CurrSamplesPerSec=5.707494340780261, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:51:39,887] [INFO] [timer.py:197:stop] 0/2882, RunningAvgSamplesPerSec=6.327938244011372, CurrSamplesPerSec=5.671862933828582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:51:51,197] [INFO] [timer.py:197:stop] 0/2884, RunningAvgSamplesPerSec=6.327946344601117, CurrSamplesPerSec=5.723690822156731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:02,505] [INFO] [timer.py:197:stop] 0/2886, RunningAvgSamplesPerSec=6.327955051767076, CurrSamplesPerSec=5.702812738417894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:13,978] [INFO] [timer.py:197:stop] 0/2888, RunningAvgSamplesPerSec=6.327962604578867, CurrSamplesPerSec=5.709647715323768, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:25,230] [INFO] [timer.py:197:stop] 0/2890, RunningAvgSamplesPerSec=6.327974743924771, CurrSamplesPerSec=5.729162837413578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:36,541] [INFO] [timer.py:197:stop] 0/2892, RunningAvgSamplesPerSec=6.327982849160652, CurrSamplesPerSec=5.6937498096324335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:47,845] [INFO] [timer.py:197:stop] 0/2894, RunningAvgSamplesPerSec=6.32799236964038, CurrSamplesPerSec=5.705542678687533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:52:59,138] [INFO] [timer.py:197:stop] 0/2896, RunningAvgSamplesPerSec=6.328000738339299, CurrSamplesPerSec=5.686164499770041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:53:10,439] [INFO] [timer.py:197:stop] 0/2898, RunningAvgSamplesPerSec=6.328011091842845, CurrSamplesPerSec=5.701617439717182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:53:21,757] [INFO] [logging.py:68:log_dist] [Rank 0] step=1450, skipped=5, lr=[7.902222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 00:53:21,759] [INFO] [timer.py:197:stop] 0/2900, RunningAvgSamplesPerSec=6.328008276848249, CurrSamplesPerSec=5.682868068684691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0042, 'learning_rate': 7.902222222222223e-06, 'epoch': 6.14} [2022-12-17 00:53:33,101] [INFO] [timer.py:197:stop] 0/2902, RunningAvgSamplesPerSec=6.328002365385474, CurrSamplesPerSec=5.690576310554871, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:53:44,416] [INFO] [timer.py:197:stop] 0/2904, RunningAvgSamplesPerSec=6.328008007908925, CurrSamplesPerSec=5.688421865117699, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:53:55,706] [INFO] [timer.py:197:stop] 0/2906, RunningAvgSamplesPerSec=6.32802756176131, CurrSamplesPerSec=5.7326287772585305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:54:07,008] [INFO] [timer.py:197:stop] 0/2908, RunningAvgSamplesPerSec=6.328038384867272, CurrSamplesPerSec=5.706880117177133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:54:18,318] [INFO] [timer.py:197:stop] 0/2910, RunningAvgSamplesPerSec=6.32804560058214, CurrSamplesPerSec=5.695414015063271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:54:29,633] [INFO] [timer.py:197:stop] 0/2912, RunningAvgSamplesPerSec=6.328050628624341, CurrSamplesPerSec=5.713400637270775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:54:40,963] [INFO] [timer.py:197:stop] 0/2914, RunningAvgSamplesPerSec=6.328050203289852, CurrSamplesPerSec=5.678421350142625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:54:52,263] [INFO] [timer.py:197:stop] 0/2916, RunningAvgSamplesPerSec=6.328062633120781, CurrSamplesPerSec=5.725012607653451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:55:03,579] [INFO] [timer.py:197:stop] 0/2918, RunningAvgSamplesPerSec=6.328068609223025, CurrSamplesPerSec=5.691808017180035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:55:14,905] [INFO] [logging.py:68:log_dist] [Rank 0] step=1460, skipped=5, lr=[7.88e-06], mom=[[0.9, 0.999]] [2022-12-17 00:55:14,907] [INFO] [timer.py:197:stop] 0/2920, RunningAvgSamplesPerSec=6.328069419533752, CurrSamplesPerSec=5.68772930662516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:55:26,187] [INFO] [timer.py:197:stop] 0/2922, RunningAvgSamplesPerSec=6.328090564627229, CurrSamplesPerSec=5.727310949560448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:55:37,505] [INFO] [timer.py:197:stop] 0/2924, RunningAvgSamplesPerSec=6.328096957132478, CurrSamplesPerSec=5.695935849996165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:55:48,808] [INFO] [timer.py:197:stop] 0/2926, RunningAvgSamplesPerSec=6.328107082929748, CurrSamplesPerSec=5.70548034649371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:00,138] [INFO] [timer.py:197:stop] 0/2928, RunningAvgSamplesPerSec=6.328098667754056, CurrSamplesPerSec=5.683542596569245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:11,544] [INFO] [timer.py:197:stop] 0/2930, RunningAvgSamplesPerSec=6.328100464276708, CurrSamplesPerSec=5.68885778373489, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:22,880] [INFO] [timer.py:197:stop] 0/2932, RunningAvgSamplesPerSec=6.328096690063188, CurrSamplesPerSec=5.688263475398798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:34,206] [INFO] [timer.py:197:stop] 0/2934, RunningAvgSamplesPerSec=6.328093117601106, CurrSamplesPerSec=5.680745662337398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:45,527] [INFO] [timer.py:197:stop] 0/2936, RunningAvgSamplesPerSec=6.328096112538995, CurrSamplesPerSec=5.6893685300193875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:56:56,846] [INFO] [timer.py:197:stop] 0/2938, RunningAvgSamplesPerSec=6.328101176150009, CurrSamplesPerSec=5.711024745044085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:57:08,127] [INFO] [logging.py:68:log_dist] [Rank 0] step=1470, skipped=5, lr=[7.857777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 00:57:08,129] [INFO] [timer.py:197:stop] 0/2940, RunningAvgSamplesPerSec=6.3281204012544725, CurrSamplesPerSec=5.70905342657683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:57:19,476] [INFO] [timer.py:197:stop] 0/2942, RunningAvgSamplesPerSec=6.328114388296346, CurrSamplesPerSec=5.67222967619481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:57:30,750] [INFO] [timer.py:197:stop] 0/2944, RunningAvgSamplesPerSec=6.3281320847324265, CurrSamplesPerSec=5.725135197931068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:57:42,101] [INFO] [timer.py:197:stop] 0/2946, RunningAvgSamplesPerSec=6.328123390598462, CurrSamplesPerSec=5.693491616380447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:57:53,409] [INFO] [timer.py:197:stop] 0/2948, RunningAvgSamplesPerSec=6.328131816672485, CurrSamplesPerSec=5.72276832667421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:58:04,719] [INFO] [timer.py:197:stop] 0/2950, RunningAvgSamplesPerSec=6.328138734874885, CurrSamplesPerSec=5.704884257576782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0036, 'learning_rate': 7.846666666666667e-06, 'epoch': 6.25} [2022-12-17 00:58:16,022] [INFO] [timer.py:197:stop] 0/2952, RunningAvgSamplesPerSec=6.328143859408265, CurrSamplesPerSec=5.697003020922854, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:58:27,330] [INFO] [timer.py:197:stop] 0/2954, RunningAvgSamplesPerSec=6.328152288098311, CurrSamplesPerSec=5.707560357664528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:58:38,599] [INFO] [timer.py:197:stop] 0/2956, RunningAvgSamplesPerSec=6.3281775886127, CurrSamplesPerSec=5.746036882029965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:58:49,882] [INFO] [timer.py:197:stop] 0/2958, RunningAvgSamplesPerSec=6.328195107651893, CurrSamplesPerSec=5.729716802821961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:01,199] [INFO] [logging.py:68:log_dist] [Rank 0] step=1480, skipped=5, lr=[7.835555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 00:59:01,200] [INFO] [timer.py:197:stop] 0/2960, RunningAvgSamplesPerSec=6.328192492674206, CurrSamplesPerSec=5.675167243100677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:12,524] [INFO] [timer.py:197:stop] 0/2962, RunningAvgSamplesPerSec=6.328193774002038, CurrSamplesPerSec=5.702767911729754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:23,848] [INFO] [timer.py:197:stop] 0/2964, RunningAvgSamplesPerSec=6.328196359985549, CurrSamplesPerSec=5.69108712416596, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:35,223] [INFO] [timer.py:197:stop] 0/2966, RunningAvgSamplesPerSec=6.328176776650489, CurrSamplesPerSec=5.6551737562799635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:46,518] [INFO] [timer.py:197:stop] 0/2968, RunningAvgSamplesPerSec=6.328183787121418, CurrSamplesPerSec=5.708379142483334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 00:59:57,870] [INFO] [timer.py:197:stop] 0/2970, RunningAvgSamplesPerSec=6.328175802544738, CurrSamplesPerSec=5.6941065847717365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:00:09,190] [INFO] [timer.py:197:stop] 0/2972, RunningAvgSamplesPerSec=6.328178575983986, CurrSamplesPerSec=5.687292114979654, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:00:20,444] [INFO] [timer.py:197:stop] 0/2974, RunningAvgSamplesPerSec=6.32819556655408, CurrSamplesPerSec=5.728663748848603, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:00:31,777] [INFO] [timer.py:197:stop] 0/2976, RunningAvgSamplesPerSec=6.328196985255409, CurrSamplesPerSec=5.705050849219822, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:00:43,090] [INFO] [timer.py:197:stop] 0/2978, RunningAvgSamplesPerSec=6.328196506480381, CurrSamplesPerSec=5.679157781284493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:00:54,453] [INFO] [logging.py:68:log_dist] [Rank 0] step=1490, skipped=5, lr=[7.813333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 01:00:54,455] [INFO] [timer.py:197:stop] 0/2980, RunningAvgSamplesPerSec=6.328180755791063, CurrSamplesPerSec=5.674602902137952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:01:05,768] [INFO] [timer.py:197:stop] 0/2982, RunningAvgSamplesPerSec=6.328186646636752, CurrSamplesPerSec=5.689070221829174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:01:17,073] [INFO] [timer.py:197:stop] 0/2984, RunningAvgSamplesPerSec=6.328195956601233, CurrSamplesPerSec=5.706514703516415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:01:28,369] [INFO] [timer.py:197:stop] 0/2986, RunningAvgSamplesPerSec=6.328196240513767, CurrSamplesPerSec=5.6864718992358085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:01:39,680] [INFO] [timer.py:197:stop] 0/2988, RunningAvgSamplesPerSec=6.328195471669826, CurrSamplesPerSec=5.705590702084778, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:01:50,966] [INFO] [timer.py:197:stop] 0/2990, RunningAvgSamplesPerSec=6.328212465188259, CurrSamplesPerSec=5.726522885359659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:02:02,300] [INFO] [timer.py:197:stop] 0/2992, RunningAvgSamplesPerSec=6.328211943571726, CurrSamplesPerSec=5.682009919512811, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:02:13,628] [INFO] [timer.py:197:stop] 0/2994, RunningAvgSamplesPerSec=6.328213010624356, CurrSamplesPerSec=5.679872770727517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:02:24,994] [INFO] [timer.py:197:stop] 0/2996, RunningAvgSamplesPerSec=6.328196403619549, CurrSamplesPerSec=5.635151715865987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:02:36,321] [INFO] [timer.py:197:stop] 0/2998, RunningAvgSamplesPerSec=6.328198002255571, CurrSamplesPerSec=5.68202483326497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:02:47,639] [INFO] [logging.py:68:log_dist] [Rank 0] step=1500, skipped=5, lr=[7.791111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 01:02:47,641] [INFO] [timer.py:197:stop] 0/3000, RunningAvgSamplesPerSec=6.328200625826034, CurrSamplesPerSec=5.707326150252773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.004, 'learning_rate': 7.791111111111111e-06, 'epoch': 6.36} [2022-12-17 01:02:59,034] [INFO] [timer.py:197:stop] 0/3002, RunningAvgSamplesPerSec=6.328199386225926, CurrSamplesPerSec=5.695768339988552, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:03:10,433] [INFO] [timer.py:197:stop] 0/3004, RunningAvgSamplesPerSec=6.32818221561934, CurrSamplesPerSec=5.649236891850217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:03:21,836] [INFO] [timer.py:197:stop] 0/3006, RunningAvgSamplesPerSec=6.328171893389803, CurrSamplesPerSec=5.648525790615511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:03:33,230] [INFO] [timer.py:197:stop] 0/3008, RunningAvgSamplesPerSec=6.32817485497259, CurrSamplesPerSec=5.675626332797857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:03:44,527] [INFO] [timer.py:197:stop] 0/3010, RunningAvgSamplesPerSec=6.328192386500218, CurrSamplesPerSec=5.713108315122546, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:03:55,891] [INFO] [timer.py:197:stop] 0/3012, RunningAvgSamplesPerSec=6.32818153197785, CurrSamplesPerSec=5.6502407265569765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:04:07,186] [INFO] [timer.py:197:stop] 0/3014, RunningAvgSamplesPerSec=6.32820043048318, CurrSamplesPerSec=5.70614302926414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:04:18,568] [INFO] [timer.py:197:stop] 0/3016, RunningAvgSamplesPerSec=6.328197524571121, CurrSamplesPerSec=5.681629164259772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:04:29,905] [INFO] [timer.py:197:stop] 0/3018, RunningAvgSamplesPerSec=6.328206598740154, CurrSamplesPerSec=5.702575286144794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:04:41,344] [INFO] [logging.py:68:log_dist] [Rank 0] step=1510, skipped=5, lr=[7.768888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 01:04:41,346] [INFO] [timer.py:197:stop] 0/3020, RunningAvgSamplesPerSec=6.328211056766832, CurrSamplesPerSec=5.692277528461339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:04:52,754] [INFO] [timer.py:197:stop] 0/3022, RunningAvgSamplesPerSec=6.3281936764155375, CurrSamplesPerSec=5.664522732458062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:05:04,136] [INFO] [timer.py:197:stop] 0/3024, RunningAvgSamplesPerSec=6.328204971771824, CurrSamplesPerSec=5.694208770316031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:05:15,553] [INFO] [timer.py:197:stop] 0/3026, RunningAvgSamplesPerSec=6.328206809337348, CurrSamplesPerSec=5.689582694558557, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:05:26,858] [INFO] [timer.py:197:stop] 0/3028, RunningAvgSamplesPerSec=6.328218086355139, CurrSamplesPerSec=5.7045443147146075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:05:38,271] [INFO] [timer.py:197:stop] 0/3030, RunningAvgSamplesPerSec=6.328196316180822, CurrSamplesPerSec=5.621920650997765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:05:49,639] [INFO] [timer.py:197:stop] 0/3032, RunningAvgSamplesPerSec=6.3281938507722595, CurrSamplesPerSec=5.68351371589056, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:00,994] [INFO] [timer.py:197:stop] 0/3034, RunningAvgSamplesPerSec=6.328193529434247, CurrSamplesPerSec=5.692921210664173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:12,324] [INFO] [timer.py:197:stop] 0/3036, RunningAvgSamplesPerSec=6.32819211447804, CurrSamplesPerSec=5.683399640079139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:23,619] [INFO] [timer.py:197:stop] 0/3038, RunningAvgSamplesPerSec=6.32820489915555, CurrSamplesPerSec=5.697037117024608, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:34,993] [INFO] [logging.py:68:log_dist] [Rank 0] step=1520, skipped=5, lr=[7.746666666666666e-06], mom=[[0.9, 0.999]] [2022-12-17 01:06:34,995] [INFO] [timer.py:197:stop] 0/3040, RunningAvgSamplesPerSec=6.328199343076803, CurrSamplesPerSec=5.664262163134862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:46,295] [INFO] [timer.py:197:stop] 0/3042, RunningAvgSamplesPerSec=6.328224462098267, CurrSamplesPerSec=5.734026714300936, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:06:57,577] [INFO] [timer.py:197:stop] 0/3044, RunningAvgSamplesPerSec=6.328244394666527, CurrSamplesPerSec=5.7183171979788465, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:07:09,060] [INFO] [timer.py:197:stop] 0/3046, RunningAvgSamplesPerSec=6.3282673969346055, CurrSamplesPerSec=5.719744725116697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:07:20,368] [INFO] [timer.py:197:stop] 0/3048, RunningAvgSamplesPerSec=6.328277404458231, CurrSamplesPerSec=5.707639482854105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:07:31,625] [INFO] [timer.py:197:stop] 0/3050, RunningAvgSamplesPerSec=6.32830059436814, CurrSamplesPerSec=5.704340174492353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0043, 'learning_rate': 7.735555555555557e-06, 'epoch': 6.46} [2022-12-17 01:07:42,903] [INFO] [timer.py:197:stop] 0/3052, RunningAvgSamplesPerSec=6.328322627436005, CurrSamplesPerSec=5.717866522321677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:07:54,263] [INFO] [timer.py:197:stop] 0/3054, RunningAvgSamplesPerSec=6.328310776045788, CurrSamplesPerSec=5.644038407696543, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:08:05,606] [INFO] [timer.py:197:stop] 0/3056, RunningAvgSamplesPerSec=6.328307480005151, CurrSamplesPerSec=5.682056585385183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:08:16,946] [INFO] [timer.py:197:stop] 0/3058, RunningAvgSamplesPerSec=6.328328640712785, CurrSamplesPerSec=5.737794838599024, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:08:28,584] [INFO] [logging.py:68:log_dist] [Rank 0] step=1530, skipped=5, lr=[7.724444444444446e-06], mom=[[0.9, 0.999]] [2022-12-17 01:08:28,586] [INFO] [timer.py:197:stop] 0/3060, RunningAvgSamplesPerSec=6.328350255214509, CurrSamplesPerSec=5.725933139832026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:08:39,910] [INFO] [timer.py:197:stop] 0/3062, RunningAvgSamplesPerSec=6.328353663895335, CurrSamplesPerSec=5.690856679653279, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:08:51,210] [INFO] [timer.py:197:stop] 0/3064, RunningAvgSamplesPerSec=6.3283712330026605, CurrSamplesPerSec=5.732193714185728, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:02,696] [INFO] [timer.py:197:stop] 0/3066, RunningAvgSamplesPerSec=6.328348230192032, CurrSamplesPerSec=5.670931429706914, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:14,003] [INFO] [timer.py:197:stop] 0/3068, RunningAvgSamplesPerSec=6.328349799635841, CurrSamplesPerSec=5.696333514216841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:25,298] [INFO] [timer.py:197:stop] 0/3070, RunningAvgSamplesPerSec=6.328363329240027, CurrSamplesPerSec=5.728342480612314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:36,650] [INFO] [timer.py:197:stop] 0/3072, RunningAvgSamplesPerSec=6.328353736271112, CurrSamplesPerSec=5.70964358620523, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:47,990] [INFO] [timer.py:197:stop] 0/3074, RunningAvgSamplesPerSec=6.328349460626023, CurrSamplesPerSec=5.692681442553798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:09:59,296] [INFO] [timer.py:197:stop] 0/3076, RunningAvgSamplesPerSec=6.328346360291402, CurrSamplesPerSec=5.692624944247637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:10:10,941] [INFO] [timer.py:197:stop] 0/3078, RunningAvgSamplesPerSec=6.328340149333154, CurrSamplesPerSec=5.694005610697106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:10:22,226] [INFO] [logging.py:68:log_dist] [Rank 0] step=1540, skipped=5, lr=[7.702222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 01:10:22,228] [INFO] [timer.py:197:stop] 0/3080, RunningAvgSamplesPerSec=6.328350389748184, CurrSamplesPerSec=5.702016866649183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:10:33,719] [INFO] [timer.py:197:stop] 0/3082, RunningAvgSamplesPerSec=6.328292988477743, CurrSamplesPerSec=5.544842368600772, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:10:45,263] [INFO] [timer.py:197:stop] 0/3084, RunningAvgSamplesPerSec=6.32829325268185, CurrSamplesPerSec=5.685042388422377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:10:56,821] [INFO] [timer.py:197:stop] 0/3086, RunningAvgSamplesPerSec=6.3283058528128615, CurrSamplesPerSec=5.722950849799248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:11:08,163] [INFO] [timer.py:197:stop] 0/3088, RunningAvgSamplesPerSec=6.32829141345936, CurrSamplesPerSec=5.654485696188213, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:11:19,396] [INFO] [timer.py:197:stop] 0/3090, RunningAvgSamplesPerSec=6.328328365422752, CurrSamplesPerSec=5.7478484709252315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:11:30,683] [INFO] [timer.py:197:stop] 0/3092, RunningAvgSamplesPerSec=6.328336061377952, CurrSamplesPerSec=5.695775349589328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:11:42,330] [INFO] [timer.py:197:stop] 0/3094, RunningAvgSamplesPerSec=6.3282051707003575, CurrSamplesPerSec=5.388241062793341, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:11:53,643] [INFO] [timer.py:197:stop] 0/3096, RunningAvgSamplesPerSec=6.328209741459312, CurrSamplesPerSec=5.7090888812825025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:12:04,963] [INFO] [timer.py:197:stop] 0/3098, RunningAvgSamplesPerSec=6.328211787634497, CurrSamplesPerSec=5.703349986587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:12:16,613] [INFO] [logging.py:68:log_dist] [Rank 0] step=1550, skipped=5, lr=[7.680000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 01:12:16,615] [INFO] [timer.py:197:stop] 0/3100, RunningAvgSamplesPerSec=6.328228541001663, CurrSamplesPerSec=5.715261548115209, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0049, 'learning_rate': 7.680000000000001e-06, 'epoch': 6.57} [2022-12-17 01:12:27,913] [INFO] [timer.py:197:stop] 0/3102, RunningAvgSamplesPerSec=6.328239308792876, CurrSamplesPerSec=5.701619135165028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:12:39,435] [INFO] [timer.py:197:stop] 0/3104, RunningAvgSamplesPerSec=6.32823721816558, CurrSamplesPerSec=5.669716884049624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:12:51,005] [INFO] [timer.py:197:stop] 0/3106, RunningAvgSamplesPerSec=6.328228348857222, CurrSamplesPerSec=5.703074928081486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:02,338] [INFO] [timer.py:197:stop] 0/3108, RunningAvgSamplesPerSec=6.328225401121336, CurrSamplesPerSec=5.68624785088645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:13,623] [INFO] [timer.py:197:stop] 0/3110, RunningAvgSamplesPerSec=6.328242571516347, CurrSamplesPerSec=5.699047815744143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:25,020] [INFO] [timer.py:197:stop] 0/3112, RunningAvgSamplesPerSec=6.328233285287763, CurrSamplesPerSec=5.688660792324933, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:36,334] [INFO] [timer.py:197:stop] 0/3114, RunningAvgSamplesPerSec=6.328238544947004, CurrSamplesPerSec=5.686951132873655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:47,651] [INFO] [timer.py:197:stop] 0/3116, RunningAvgSamplesPerSec=6.3282439130296275, CurrSamplesPerSec=5.702263962870334, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:13:59,310] [INFO] [timer.py:197:stop] 0/3118, RunningAvgSamplesPerSec=6.328232184084257, CurrSamplesPerSec=5.701733701334038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:14:10,649] [INFO] [logging.py:68:log_dist] [Rank 0] step=1560, skipped=5, lr=[7.657777777777779e-06], mom=[[0.9, 0.999]] [2022-12-17 01:14:10,650] [INFO] [timer.py:197:stop] 0/3120, RunningAvgSamplesPerSec=6.328226877412343, CurrSamplesPerSec=5.668705160002777, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:14:22,001] [INFO] [timer.py:197:stop] 0/3122, RunningAvgSamplesPerSec=6.32823370378307, CurrSamplesPerSec=5.694993522906592, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:14:33,559] [INFO] [timer.py:197:stop] 0/3124, RunningAvgSamplesPerSec=6.328224777775604, CurrSamplesPerSec=5.713356616797894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:14:44,886] [INFO] [timer.py:197:stop] 0/3126, RunningAvgSamplesPerSec=6.328225809771268, CurrSamplesPerSec=5.680624003958813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:14:56,182] [INFO] [timer.py:197:stop] 0/3128, RunningAvgSamplesPerSec=6.3282387265643685, CurrSamplesPerSec=5.708427456427208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:15:07,854] [INFO] [timer.py:197:stop] 0/3130, RunningAvgSamplesPerSec=6.32810360356806, CurrSamplesPerSec=5.710048025286301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:15:19,160] [INFO] [timer.py:197:stop] 0/3132, RunningAvgSamplesPerSec=6.328104647297075, CurrSamplesPerSec=5.685813295606887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:15:30,463] [INFO] [timer.py:197:stop] 0/3134, RunningAvgSamplesPerSec=6.328114366376503, CurrSamplesPerSec=5.7077627866023075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:15:41,961] [INFO] [timer.py:197:stop] 0/3136, RunningAvgSamplesPerSec=6.328043790125609, CurrSamplesPerSec=5.7158843930675225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:15:53,281] [INFO] [timer.py:197:stop] 0/3138, RunningAvgSamplesPerSec=6.328046788533697, CurrSamplesPerSec=5.686063807171815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:16:04,605] [INFO] [logging.py:68:log_dist] [Rank 0] step=1570, skipped=5, lr=[7.635555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 01:16:04,607] [INFO] [timer.py:197:stop] 0/3140, RunningAvgSamplesPerSec=6.328046055998555, CurrSamplesPerSec=5.676481834464272, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:16:16,050] [INFO] [timer.py:197:stop] 0/3142, RunningAvgSamplesPerSec=6.327998340066359, CurrSamplesPerSec=5.698629690303827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:16:27,327] [INFO] [timer.py:197:stop] 0/3144, RunningAvgSamplesPerSec=6.328019294096658, CurrSamplesPerSec=5.7174458746110774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:16:38,661] [INFO] [timer.py:197:stop] 0/3146, RunningAvgSamplesPerSec=6.328018157319334, CurrSamplesPerSec=5.6957975870577, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:16:50,065] [INFO] [timer.py:197:stop] 0/3148, RunningAvgSamplesPerSec=6.327986933018697, CurrSamplesPerSec=5.6882716719106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:17:01,366] [INFO] [timer.py:197:stop] 0/3150, RunningAvgSamplesPerSec=6.327998111905243, CurrSamplesPerSec=5.713494031017451, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0044, 'learning_rate': 7.624444444444445e-06, 'epoch': 6.67} [2022-12-17 01:17:12,671] [INFO] [timer.py:197:stop] 0/3152, RunningAvgSamplesPerSec=6.32800782192515, CurrSamplesPerSec=5.712859062377203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:17:23,999] [INFO] [timer.py:197:stop] 0/3154, RunningAvgSamplesPerSec=6.328007133184659, CurrSamplesPerSec=5.704862191582309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:17:35,292] [INFO] [timer.py:197:stop] 0/3156, RunningAvgSamplesPerSec=6.328008343341138, CurrSamplesPerSec=5.702299091032531, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:17:46,716] [INFO] [timer.py:197:stop] 0/3158, RunningAvgSamplesPerSec=6.327974298784042, CurrSamplesPerSec=5.598834661803304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:17:58,351] [INFO] [logging.py:68:log_dist] [Rank 0] step=1580, skipped=5, lr=[7.613333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 01:17:58,353] [INFO] [timer.py:197:stop] 0/3160, RunningAvgSamplesPerSec=6.327970421835056, CurrSamplesPerSec=5.6944965039264135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:18:09,724] [INFO] [timer.py:197:stop] 0/3162, RunningAvgSamplesPerSec=6.327974121817204, CurrSamplesPerSec=5.690142058399431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:18:21,064] [INFO] [timer.py:197:stop] 0/3164, RunningAvgSamplesPerSec=6.327969293519854, CurrSamplesPerSec=5.670093169884984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:18:32,571] [INFO] [timer.py:197:stop] 0/3166, RunningAvgSamplesPerSec=6.327975653807134, CurrSamplesPerSec=5.716188928224968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:18:43,904] [INFO] [timer.py:197:stop] 0/3168, RunningAvgSamplesPerSec=6.32798179357277, CurrSamplesPerSec=5.698376134854511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:18:55,575] [INFO] [timer.py:197:stop] 0/3170, RunningAvgSamplesPerSec=6.327846338624374, CurrSamplesPerSec=5.3722874932535065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:19:07,004] [INFO] [timer.py:197:stop] 0/3172, RunningAvgSamplesPerSec=6.327857912207166, CurrSamplesPerSec=5.710076690410414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:19:18,288] [INFO] [timer.py:197:stop] 0/3174, RunningAvgSamplesPerSec=6.32786939326772, CurrSamplesPerSec=5.717158983073055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:19:29,608] [INFO] [timer.py:197:stop] 0/3176, RunningAvgSamplesPerSec=6.327860409614954, CurrSamplesPerSec=5.64593135025226, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:19:40,950] [INFO] [timer.py:197:stop] 0/3178, RunningAvgSamplesPerSec=6.327863214845376, CurrSamplesPerSec=5.690820003295321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:19:52,413] [INFO] [logging.py:68:log_dist] [Rank 0] step=1590, skipped=5, lr=[7.5911111111111115e-06], mom=[[0.9, 0.999]] [2022-12-17 01:19:52,414] [INFO] [timer.py:197:stop] 0/3180, RunningAvgSamplesPerSec=6.327857851576632, CurrSamplesPerSec=5.690702979919093, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:20:03,819] [INFO] [timer.py:197:stop] 0/3182, RunningAvgSamplesPerSec=6.327821539117611, CurrSamplesPerSec=5.589438273183318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:20:15,308] [INFO] [timer.py:197:stop] 0/3184, RunningAvgSamplesPerSec=6.3278335370182965, CurrSamplesPerSec=5.70394527009692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:20:26,668] [INFO] [timer.py:197:stop] 0/3186, RunningAvgSamplesPerSec=6.327832117311527, CurrSamplesPerSec=5.6870434229343845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:20:38,217] [INFO] [timer.py:197:stop] 0/3188, RunningAvgSamplesPerSec=6.327744950423371, CurrSamplesPerSec=5.49062223728194, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:20:49,700] [INFO] [timer.py:197:stop] 0/3190, RunningAvgSamplesPerSec=6.327757491639702, CurrSamplesPerSec=5.707426141117586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:21:01,022] [INFO] [timer.py:197:stop] 0/3192, RunningAvgSamplesPerSec=6.32775339708403, CurrSamplesPerSec=5.679734325104576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:21:12,701] [INFO] [timer.py:197:stop] 0/3194, RunningAvgSamplesPerSec=6.327616548561185, CurrSamplesPerSec=5.348186843516585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:21:24,002] [INFO] [timer.py:197:stop] 0/3196, RunningAvgSamplesPerSec=6.327615046230956, CurrSamplesPerSec=5.6908863588942635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:21:35,314] [INFO] [timer.py:197:stop] 0/3198, RunningAvgSamplesPerSec=6.32762007083936, CurrSamplesPerSec=5.7030361554478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:21:46,722] [INFO] [logging.py:68:log_dist] [Rank 0] step=1600, skipped=5, lr=[7.56888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 01:21:46,724] [INFO] [timer.py:197:stop] 0/3200, RunningAvgSamplesPerSec=6.32757363728904, CurrSamplesPerSec=5.5781339101464695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0042, 'learning_rate': 7.56888888888889e-06, 'epoch': 6.78} [2022-12-17 01:21:57,995] [INFO] [timer.py:197:stop] 0/3202, RunningAvgSamplesPerSec=6.327589379341759, CurrSamplesPerSec=5.700374705537238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:22:09,310] [INFO] [timer.py:197:stop] 0/3204, RunningAvgSamplesPerSec=6.32759287258235, CurrSamplesPerSec=5.696930235574727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:22:20,680] [INFO] [timer.py:197:stop] 0/3206, RunningAvgSamplesPerSec=6.32757518739998, CurrSamplesPerSec=5.629936601948145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:22:31,975] [INFO] [timer.py:197:stop] 0/3208, RunningAvgSamplesPerSec=6.327587198267436, CurrSamplesPerSec=5.728729522817179, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:22:43,303] [INFO] [timer.py:197:stop] 0/3210, RunningAvgSamplesPerSec=6.3275805587083065, CurrSamplesPerSec=5.692876056414973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:22:54,687] [INFO] [timer.py:197:stop] 0/3212, RunningAvgSamplesPerSec=6.327552061273541, CurrSamplesPerSec=5.62546384037998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:23:05,975] [INFO] [timer.py:197:stop] 0/3214, RunningAvgSamplesPerSec=6.327566489107612, CurrSamplesPerSec=5.727871645706151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:23:17,250] [INFO] [timer.py:197:stop] 0/3216, RunningAvgSamplesPerSec=6.32758572146028, CurrSamplesPerSec=5.724705666582072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:23:28,663] [INFO] [timer.py:197:stop] 0/3218, RunningAvgSamplesPerSec=6.327547048615964, CurrSamplesPerSec=5.602575408343329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:23:39,974] [INFO] [logging.py:68:log_dist] [Rank 0] step=1610, skipped=5, lr=[7.5466666666666675e-06], mom=[[0.9, 0.999]] [2022-12-17 01:23:39,976] [INFO] [timer.py:197:stop] 0/3220, RunningAvgSamplesPerSec=6.327555916639667, CurrSamplesPerSec=5.693980971633932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:23:51,452] [INFO] [timer.py:197:stop] 0/3222, RunningAvgSamplesPerSec=6.327556455274264, CurrSamplesPerSec=5.676542814405839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:02,783] [INFO] [timer.py:197:stop] 0/3224, RunningAvgSamplesPerSec=6.327543073056566, CurrSamplesPerSec=5.64955790892777, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:14,310] [INFO] [timer.py:197:stop] 0/3226, RunningAvgSamplesPerSec=6.327543805243693, CurrSamplesPerSec=5.700171105707942, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:25,643] [INFO] [timer.py:197:stop] 0/3228, RunningAvgSamplesPerSec=6.327543698754075, CurrSamplesPerSec=5.690341564909424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:37,337] [INFO] [timer.py:197:stop] 0/3230, RunningAvgSamplesPerSec=6.327403019420782, CurrSamplesPerSec=5.337525799770517, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:48,635] [INFO] [timer.py:197:stop] 0/3232, RunningAvgSamplesPerSec=6.3274164819432155, CurrSamplesPerSec=5.710538044157391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:24:59,955] [INFO] [timer.py:197:stop] 0/3234, RunningAvgSamplesPerSec=6.327423771765399, CurrSamplesPerSec=5.703635493871505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:25:11,634] [INFO] [timer.py:197:stop] 0/3236, RunningAvgSamplesPerSec=6.327421480528153, CurrSamplesPerSec=5.694893725552477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:25:22,965] [INFO] [timer.py:197:stop] 0/3238, RunningAvgSamplesPerSec=6.327420600905887, CurrSamplesPerSec=5.67650440169151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:25:34,557] [INFO] [logging.py:68:log_dist] [Rank 0] step=1620, skipped=5, lr=[7.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 01:25:34,559] [INFO] [timer.py:197:stop] 0/3240, RunningAvgSamplesPerSec=6.327429179698878, CurrSamplesPerSec=5.699057253311078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:25:46,133] [INFO] [timer.py:197:stop] 0/3242, RunningAvgSamplesPerSec=6.32741687103007, CurrSamplesPerSec=5.676417974985021, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:25:57,425] [INFO] [timer.py:197:stop] 0/3244, RunningAvgSamplesPerSec=6.327418780991531, CurrSamplesPerSec=5.702237071948988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:26:08,782] [INFO] [timer.py:197:stop] 0/3246, RunningAvgSamplesPerSec=6.327418209197686, CurrSamplesPerSec=5.68327979321014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:26:20,094] [INFO] [timer.py:197:stop] 0/3248, RunningAvgSamplesPerSec=6.327419606512401, CurrSamplesPerSec=5.695729183225739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:26:31,392] [INFO] [timer.py:197:stop] 0/3250, RunningAvgSamplesPerSec=6.327426160437134, CurrSamplesPerSec=5.701693978081709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0047, 'learning_rate': 7.513333333333334e-06, 'epoch': 6.89} [2022-12-17 01:26:42,901] [INFO] [timer.py:197:stop] 0/3252, RunningAvgSamplesPerSec=6.327439818024092, CurrSamplesPerSec=5.7171709160181425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:26:54,483] [INFO] [timer.py:197:stop] 0/3254, RunningAvgSamplesPerSec=6.327398825876986, CurrSamplesPerSec=5.6259394486595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:27:05,859] [INFO] [timer.py:197:stop] 0/3256, RunningAvgSamplesPerSec=6.327402350044329, CurrSamplesPerSec=5.708757421255074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:27:17,373] [INFO] [timer.py:197:stop] 0/3258, RunningAvgSamplesPerSec=6.327417754863015, CurrSamplesPerSec=5.714677282811018, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:27:28,757] [INFO] [logging.py:68:log_dist] [Rank 0] step=1630, skipped=5, lr=[7.502222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 01:27:28,759] [INFO] [timer.py:197:stop] 0/3260, RunningAvgSamplesPerSec=6.327395600822194, CurrSamplesPerSec=5.708160647422357, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:27:40,034] [INFO] [timer.py:197:stop] 0/3262, RunningAvgSamplesPerSec=6.327406874963428, CurrSamplesPerSec=5.70087105619365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:27:51,590] [INFO] [timer.py:197:stop] 0/3264, RunningAvgSamplesPerSec=6.327415745958481, CurrSamplesPerSec=5.710762066912186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:03,039] [INFO] [timer.py:197:stop] 0/3266, RunningAvgSamplesPerSec=6.327404836900336, CurrSamplesPerSec=5.713503516490646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:14,396] [INFO] [timer.py:197:stop] 0/3268, RunningAvgSamplesPerSec=6.327396207831295, CurrSamplesPerSec=5.646276457009662, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:25,688] [INFO] [timer.py:197:stop] 0/3270, RunningAvgSamplesPerSec=6.3274121524731095, CurrSamplesPerSec=5.7055994336984055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:36,959] [INFO] [timer.py:197:stop] 0/3272, RunningAvgSamplesPerSec=6.327425382008135, CurrSamplesPerSec=5.689047554579456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:48,243] [INFO] [timer.py:197:stop] 0/3274, RunningAvgSamplesPerSec=6.3274379194298485, CurrSamplesPerSec=5.697619473741943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:28:59,539] [INFO] [timer.py:197:stop] 0/3276, RunningAvgSamplesPerSec=6.32745146020036, CurrSamplesPerSec=5.718524776283749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:29:10,946] [INFO] [timer.py:197:stop] 0/3278, RunningAvgSamplesPerSec=6.327423523144233, CurrSamplesPerSec=5.705472342847645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:29:22,224] [INFO] [logging.py:68:log_dist] [Rank 0] step=1640, skipped=5, lr=[7.48e-06], mom=[[0.9, 0.999]] [2022-12-17 01:29:22,226] [INFO] [timer.py:197:stop] 0/3280, RunningAvgSamplesPerSec=6.327442652650779, CurrSamplesPerSec=5.71626756247743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:29:33,726] [INFO] [timer.py:197:stop] 0/3282, RunningAvgSamplesPerSec=6.327382545022914, CurrSamplesPerSec=5.5422214418806774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:29:45,216] [INFO] [timer.py:197:stop] 0/3284, RunningAvgSamplesPerSec=6.327418272970632, CurrSamplesPerSec=5.757968026648921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:29:56,469] [INFO] [timer.py:197:stop] 0/3286, RunningAvgSamplesPerSec=6.327447503106981, CurrSamplesPerSec=5.728942503911237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:30:07,728] [INFO] [timer.py:197:stop] 0/3288, RunningAvgSamplesPerSec=6.327468220307309, CurrSamplesPerSec=5.715481804066884, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:30:19,024] [INFO] [timer.py:197:stop] 0/3290, RunningAvgSamplesPerSec=6.327485129623551, CurrSamplesPerSec=5.706678721389803, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:30:30,322] [INFO] [timer.py:197:stop] 0/3292, RunningAvgSamplesPerSec=6.327491937727378, CurrSamplesPerSec=5.6889142072886045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:30:41,761] [INFO] [timer.py:197:stop] 0/3294, RunningAvgSamplesPerSec=6.327486622166938, CurrSamplesPerSec=5.659724931531434, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:30:53,050] [INFO] [timer.py:197:stop] 0/3296, RunningAvgSamplesPerSec=6.327504146250921, CurrSamplesPerSec=5.722502615290213, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:31:04,359] [INFO] [timer.py:197:stop] 0/3298, RunningAvgSamplesPerSec=6.32751865527712, CurrSamplesPerSec=5.730489843066372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:31:15,663] [INFO] [logging.py:68:log_dist] [Rank 0] step=1650, skipped=5, lr=[7.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 01:31:15,665] [INFO] [timer.py:197:stop] 0/3300, RunningAvgSamplesPerSec=6.3275283883850655, CurrSamplesPerSec=5.694366524941489, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.006, 'learning_rate': 7.457777777777778e-06, 'epoch': 6.99} [2022-12-17 01:31:27,296] [INFO] [timer.py:197:stop] 0/3302, RunningAvgSamplesPerSec=6.327415691313412, CurrSamplesPerSec=5.403756153207737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:31:35,779] [INFO] [timer.py:197:stop] 0/3304, RunningAvgSamplesPerSec=6.328368075171492, CurrSamplesPerSec=10.19376284874849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:31:47,072] [INFO] [timer.py:197:stop] 0/3306, RunningAvgSamplesPerSec=6.328381889283473, CurrSamplesPerSec=5.726395837946484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:31:58,454] [INFO] [timer.py:197:stop] 0/3308, RunningAvgSamplesPerSec=6.3283671736582425, CurrSamplesPerSec=5.629318652175197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:32:10,088] [INFO] [timer.py:197:stop] 0/3310, RunningAvgSamplesPerSec=6.328373614216493, CurrSamplesPerSec=5.716848256979192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:32:21,660] [INFO] [timer.py:197:stop] 0/3312, RunningAvgSamplesPerSec=6.328350430073364, CurrSamplesPerSec=5.590974509773374, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:32:32,935] [INFO] [timer.py:197:stop] 0/3314, RunningAvgSamplesPerSec=6.328370576080663, CurrSamplesPerSec=5.732091629603752, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:32:44,636] [INFO] [timer.py:197:stop] 0/3316, RunningAvgSamplesPerSec=6.328365927196959, CurrSamplesPerSec=5.682405641553271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:32:56,099] [INFO] [timer.py:197:stop] 0/3318, RunningAvgSamplesPerSec=6.328378275281746, CurrSamplesPerSec=5.712058439492161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:33:07,760] [INFO] [logging.py:68:log_dist] [Rank 0] step=1660, skipped=5, lr=[7.435555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 01:33:07,762] [INFO] [timer.py:197:stop] 0/3320, RunningAvgSamplesPerSec=6.328251885037532, CurrSamplesPerSec=5.376068743066508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:33:19,283] [INFO] [timer.py:197:stop] 0/3322, RunningAvgSamplesPerSec=6.328261381499621, CurrSamplesPerSec=5.7035596304887495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:33:30,629] [INFO] [timer.py:197:stop] 0/3324, RunningAvgSamplesPerSec=6.328286561291607, CurrSamplesPerSec=5.727362761688731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:33:41,914] [INFO] [timer.py:197:stop] 0/3326, RunningAvgSamplesPerSec=6.328296633717355, CurrSamplesPerSec=5.687658927245805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:33:53,317] [INFO] [timer.py:197:stop] 0/3328, RunningAvgSamplesPerSec=6.328316702351698, CurrSamplesPerSec=5.735226819198422, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:34:04,597] [INFO] [timer.py:197:stop] 0/3330, RunningAvgSamplesPerSec=6.328335218513787, CurrSamplesPerSec=5.72160806423622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:34:15,952] [INFO] [timer.py:197:stop] 0/3332, RunningAvgSamplesPerSec=6.328321934465862, CurrSamplesPerSec=5.621941844546697, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:34:27,233] [INFO] [timer.py:197:stop] 0/3334, RunningAvgSamplesPerSec=6.328333758938271, CurrSamplesPerSec=5.710809692236099, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:34:38,478] [INFO] [timer.py:197:stop] 0/3336, RunningAvgSamplesPerSec=6.328358943259871, CurrSamplesPerSec=5.736436743153215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:34:50,063] [INFO] [timer.py:197:stop] 0/3338, RunningAvgSamplesPerSec=6.328262610423019, CurrSamplesPerSec=5.446236956626006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:01,355] [INFO] [logging.py:68:log_dist] [Rank 0] step=1670, skipped=5, lr=[7.413333333333333e-06], mom=[[0.9, 0.999]] [2022-12-17 01:35:01,356] [INFO] [timer.py:197:stop] 0/3340, RunningAvgSamplesPerSec=6.3282707354726195, CurrSamplesPerSec=5.697444367262241, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:12,660] [INFO] [timer.py:197:stop] 0/3342, RunningAvgSamplesPerSec=6.32828023530498, CurrSamplesPerSec=5.694936978697808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:24,015] [INFO] [timer.py:197:stop] 0/3344, RunningAvgSamplesPerSec=6.328290936313625, CurrSamplesPerSec=5.710620896512029, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:35,346] [INFO] [timer.py:197:stop] 0/3346, RunningAvgSamplesPerSec=6.328290690563437, CurrSamplesPerSec=5.6815291132287555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:46,924] [INFO] [timer.py:197:stop] 0/3348, RunningAvgSamplesPerSec=6.328295836466862, CurrSamplesPerSec=5.713784203464906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:35:58,407] [INFO] [timer.py:197:stop] 0/3350, RunningAvgSamplesPerSec=6.328292604414985, CurrSamplesPerSec=5.69902773076984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0034, 'learning_rate': 7.402222222222223e-06, 'epoch': 7.1} [2022-12-17 01:36:09,721] [INFO] [timer.py:197:stop] 0/3352, RunningAvgSamplesPerSec=6.328298902158826, CurrSamplesPerSec=5.690510203449995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:36:21,224] [INFO] [timer.py:197:stop] 0/3354, RunningAvgSamplesPerSec=6.328302617299473, CurrSamplesPerSec=5.688378710880185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:36:32,808] [INFO] [timer.py:197:stop] 0/3356, RunningAvgSamplesPerSec=6.328292203451692, CurrSamplesPerSec=5.710444018151198, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:36:44,164] [INFO] [timer.py:197:stop] 0/3358, RunningAvgSamplesPerSec=6.328287530561861, CurrSamplesPerSec=5.666746435361096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:36:55,655] [INFO] [logging.py:68:log_dist] [Rank 0] step=1680, skipped=5, lr=[7.3911111111111125e-06], mom=[[0.9, 0.999]] [2022-12-17 01:36:55,657] [INFO] [timer.py:197:stop] 0/3360, RunningAvgSamplesPerSec=6.328284289899163, CurrSamplesPerSec=5.6893407959260545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:37:07,021] [INFO] [timer.py:197:stop] 0/3362, RunningAvgSamplesPerSec=6.3282828404144675, CurrSamplesPerSec=5.696201517383088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:37:18,338] [INFO] [timer.py:197:stop] 0/3364, RunningAvgSamplesPerSec=6.328281482380329, CurrSamplesPerSec=5.668719525154133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:37:29,622] [INFO] [timer.py:197:stop] 0/3366, RunningAvgSamplesPerSec=6.328287682718244, CurrSamplesPerSec=5.695697761521486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:37:40,921] [INFO] [timer.py:197:stop] 0/3368, RunningAvgSamplesPerSec=6.328288491734812, CurrSamplesPerSec=5.6962090115450135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:37:52,274] [INFO] [timer.py:197:stop] 0/3370, RunningAvgSamplesPerSec=6.328280101909268, CurrSamplesPerSec=5.677296288862611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:38:03,742] [INFO] [timer.py:197:stop] 0/3372, RunningAvgSamplesPerSec=6.328229320453348, CurrSamplesPerSec=5.558497111329943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:38:15,340] [INFO] [timer.py:197:stop] 0/3374, RunningAvgSamplesPerSec=6.328237197685572, CurrSamplesPerSec=5.704464547971759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:38:26,658] [INFO] [timer.py:197:stop] 0/3376, RunningAvgSamplesPerSec=6.328234872449634, CurrSamplesPerSec=5.693225477220795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:38:38,015] [INFO] [timer.py:197:stop] 0/3378, RunningAvgSamplesPerSec=6.328223671446257, CurrSamplesPerSec=5.65220305102006, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:38:49,472] [INFO] [logging.py:68:log_dist] [Rank 0] step=1690, skipped=5, lr=[7.36888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 01:38:49,474] [INFO] [timer.py:197:stop] 0/3380, RunningAvgSamplesPerSec=6.3282343269547425, CurrSamplesPerSec=5.715569424413849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:00,780] [INFO] [timer.py:197:stop] 0/3382, RunningAvgSamplesPerSec=6.32824550333762, CurrSamplesPerSec=5.712211350120307, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:12,225] [INFO] [timer.py:197:stop] 0/3384, RunningAvgSamplesPerSec=6.3282038666680345, CurrSamplesPerSec=5.56816756394146, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:23,704] [INFO] [timer.py:197:stop] 0/3386, RunningAvgSamplesPerSec=6.32821226984345, CurrSamplesPerSec=5.716607199656674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:35,238] [INFO] [timer.py:197:stop] 0/3388, RunningAvgSamplesPerSec=6.328216059373954, CurrSamplesPerSec=5.69550392144332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:46,587] [INFO] [timer.py:197:stop] 0/3390, RunningAvgSamplesPerSec=6.328207698527992, CurrSamplesPerSec=5.657910737886335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:39:58,109] [INFO] [timer.py:197:stop] 0/3392, RunningAvgSamplesPerSec=6.328200542356812, CurrSamplesPerSec=5.686295550023471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:40:09,520] [INFO] [timer.py:197:stop] 0/3394, RunningAvgSamplesPerSec=6.328197617950004, CurrSamplesPerSec=5.683399158756303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:40:20,976] [INFO] [timer.py:197:stop] 0/3396, RunningAvgSamplesPerSec=6.328150606424908, CurrSamplesPerSec=5.551615532162709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:40:32,598] [INFO] [timer.py:197:stop] 0/3398, RunningAvgSamplesPerSec=6.328154679161973, CurrSamplesPerSec=5.7109042162776715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:40:44,064] [INFO] [logging.py:68:log_dist] [Rank 0] step=1700, skipped=5, lr=[7.346666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 01:40:44,066] [INFO] [timer.py:197:stop] 0/3400, RunningAvgSamplesPerSec=6.328143141046826, CurrSamplesPerSec=5.655962324635651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0029, 'learning_rate': 7.346666666666668e-06, 'epoch': 7.2} [2022-12-17 01:40:55,365] [INFO] [timer.py:197:stop] 0/3402, RunningAvgSamplesPerSec=6.328155188947058, CurrSamplesPerSec=5.71487048352276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:41:06,858] [INFO] [timer.py:197:stop] 0/3404, RunningAvgSamplesPerSec=6.328172193969021, CurrSamplesPerSec=5.71332500030223, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:41:18,279] [INFO] [timer.py:197:stop] 0/3406, RunningAvgSamplesPerSec=6.328175729615554, CurrSamplesPerSec=5.685131486033896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:41:29,613] [INFO] [timer.py:197:stop] 0/3408, RunningAvgSamplesPerSec=6.32816802041746, CurrSamplesPerSec=5.6532679449227095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:41:41,121] [INFO] [timer.py:197:stop] 0/3410, RunningAvgSamplesPerSec=6.32818552917702, CurrSamplesPerSec=5.728407024762613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:41:52,371] [INFO] [timer.py:197:stop] 0/3412, RunningAvgSamplesPerSec=6.328197040621683, CurrSamplesPerSec=5.717122941010747, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:42:03,686] [INFO] [timer.py:197:stop] 0/3414, RunningAvgSamplesPerSec=6.328196403591567, CurrSamplesPerSec=5.661743287944973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:42:14,985] [INFO] [timer.py:197:stop] 0/3416, RunningAvgSamplesPerSec=6.328201656772722, CurrSamplesPerSec=5.6989974825817145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:42:26,307] [INFO] [timer.py:197:stop] 0/3418, RunningAvgSamplesPerSec=6.328212993972193, CurrSamplesPerSec=5.722051035744155, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:42:37,816] [INFO] [logging.py:68:log_dist] [Rank 0] step=1710, skipped=5, lr=[7.324444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 01:42:37,817] [INFO] [timer.py:197:stop] 0/3420, RunningAvgSamplesPerSec=6.328228305425794, CurrSamplesPerSec=5.724880743494684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:42:49,145] [INFO] [timer.py:197:stop] 0/3422, RunningAvgSamplesPerSec=6.32823030158857, CurrSamplesPerSec=5.703642280470542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:00,666] [INFO] [timer.py:197:stop] 0/3424, RunningAvgSamplesPerSec=6.328230234209142, CurrSamplesPerSec=5.680257617713291, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:12,106] [INFO] [timer.py:197:stop] 0/3426, RunningAvgSamplesPerSec=6.328209177056322, CurrSamplesPerSec=5.696820698340513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:23,409] [INFO] [timer.py:197:stop] 0/3428, RunningAvgSamplesPerSec=6.3282132302632395, CurrSamplesPerSec=5.693795460828744, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:34,701] [INFO] [timer.py:197:stop] 0/3430, RunningAvgSamplesPerSec=6.328227607550369, CurrSamplesPerSec=5.724152425560023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:46,016] [INFO] [timer.py:197:stop] 0/3432, RunningAvgSamplesPerSec=6.328220581200692, CurrSamplesPerSec=5.687688573141918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:43:57,323] [INFO] [timer.py:197:stop] 0/3434, RunningAvgSamplesPerSec=6.328222375359132, CurrSamplesPerSec=5.694503027191691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:44:08,631] [INFO] [timer.py:197:stop] 0/3436, RunningAvgSamplesPerSec=6.3282244903867655, CurrSamplesPerSec=5.698355328825123, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:44:19,966] [INFO] [timer.py:197:stop] 0/3438, RunningAvgSamplesPerSec=6.328221378368761, CurrSamplesPerSec=5.709337076552991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:44:31,309] [INFO] [logging.py:68:log_dist] [Rank 0] step=1720, skipped=5, lr=[7.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 01:44:31,311] [INFO] [timer.py:197:stop] 0/3440, RunningAvgSamplesPerSec=6.328215230487609, CurrSamplesPerSec=5.695270702428186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:44:42,613] [INFO] [timer.py:197:stop] 0/3442, RunningAvgSamplesPerSec=6.328213318697902, CurrSamplesPerSec=5.67559753251525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:44:54,239] [INFO] [timer.py:197:stop] 0/3444, RunningAvgSamplesPerSec=6.328210899339845, CurrSamplesPerSec=5.709816528518276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:45:05,842] [INFO] [timer.py:197:stop] 0/3446, RunningAvgSamplesPerSec=6.328206637922383, CurrSamplesPerSec=5.678575828674865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:45:17,347] [INFO] [timer.py:197:stop] 0/3448, RunningAvgSamplesPerSec=6.328141753874068, CurrSamplesPerSec=5.502676399186368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:45:28,891] [INFO] [timer.py:197:stop] 0/3450, RunningAvgSamplesPerSec=6.328143387054643, CurrSamplesPerSec=5.6908574035335215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0031, 'learning_rate': 7.291111111111112e-06, 'epoch': 7.31} [2022-12-17 01:45:40,214] [INFO] [timer.py:197:stop] 0/3452, RunningAvgSamplesPerSec=6.328145290413738, CurrSamplesPerSec=5.703460259781672, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:45:51,590] [INFO] [timer.py:197:stop] 0/3454, RunningAvgSamplesPerSec=6.328127466213744, CurrSamplesPerSec=5.6478186708970615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:02,864] [INFO] [timer.py:197:stop] 0/3456, RunningAvgSamplesPerSec=6.328141249502755, CurrSamplesPerSec=5.700699381725708, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:14,228] [INFO] [timer.py:197:stop] 0/3458, RunningAvgSamplesPerSec=6.328139444500353, CurrSamplesPerSec=5.6923542991128775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:25,591] [INFO] [logging.py:68:log_dist] [Rank 0] step=1730, skipped=5, lr=[7.280000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 01:46:25,593] [INFO] [timer.py:197:stop] 0/3460, RunningAvgSamplesPerSec=6.328127248928353, CurrSamplesPerSec=5.651214461156085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:36,918] [INFO] [timer.py:197:stop] 0/3462, RunningAvgSamplesPerSec=6.328130860638707, CurrSamplesPerSec=5.695780667229012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:48,282] [INFO] [timer.py:197:stop] 0/3464, RunningAvgSamplesPerSec=6.328137597693461, CurrSamplesPerSec=5.706588461866827, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:46:59,698] [INFO] [timer.py:197:stop] 0/3466, RunningAvgSamplesPerSec=6.328131763092281, CurrSamplesPerSec=5.690822898780078, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:47:11,036] [INFO] [timer.py:197:stop] 0/3468, RunningAvgSamplesPerSec=6.328129722496397, CurrSamplesPerSec=5.689334284482466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:47:22,570] [INFO] [timer.py:197:stop] 0/3470, RunningAvgSamplesPerSec=6.3281306644154665, CurrSamplesPerSec=5.6844936568667706, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:47:34,221] [INFO] [timer.py:197:stop] 0/3472, RunningAvgSamplesPerSec=6.328106526053606, CurrSamplesPerSec=5.656347037051428, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:47:45,572] [INFO] [timer.py:197:stop] 0/3474, RunningAvgSamplesPerSec=6.328093162953867, CurrSamplesPerSec=5.674111114859532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:47:56,891] [INFO] [timer.py:197:stop] 0/3476, RunningAvgSamplesPerSec=6.328105604170656, CurrSamplesPerSec=5.713234044027637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:48:08,451] [INFO] [timer.py:197:stop] 0/3478, RunningAvgSamplesPerSec=6.328097448272549, CurrSamplesPerSec=5.6755171332732255, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:48:19,872] [INFO] [logging.py:68:log_dist] [Rank 0] step=1740, skipped=5, lr=[7.257777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 01:48:19,874] [INFO] [timer.py:197:stop] 0/3480, RunningAvgSamplesPerSec=6.32806475412414, CurrSamplesPerSec=5.58979722711045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:48:31,197] [INFO] [timer.py:197:stop] 0/3482, RunningAvgSamplesPerSec=6.328066269637509, CurrSamplesPerSec=5.684986041644773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:48:42,589] [INFO] [timer.py:197:stop] 0/3484, RunningAvgSamplesPerSec=6.3280702776950415, CurrSamplesPerSec=5.7072453349538295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:48:53,886] [INFO] [timer.py:197:stop] 0/3486, RunningAvgSamplesPerSec=6.328081295245514, CurrSamplesPerSec=5.694302504016668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:49:05,188] [INFO] [timer.py:197:stop] 0/3488, RunningAvgSamplesPerSec=6.328090685078356, CurrSamplesPerSec=5.711951479849029, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:49:16,555] [INFO] [timer.py:197:stop] 0/3490, RunningAvgSamplesPerSec=6.328082367812376, CurrSamplesPerSec=5.669608151295856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:49:27,952] [INFO] [timer.py:197:stop] 0/3492, RunningAvgSamplesPerSec=6.328083131471704, CurrSamplesPerSec=5.698459602429974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:49:39,247] [INFO] [timer.py:197:stop] 0/3494, RunningAvgSamplesPerSec=6.328094558189582, CurrSamplesPerSec=5.7052138129041605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:49:50,620] [INFO] [timer.py:197:stop] 0/3496, RunningAvgSamplesPerSec=6.328078205743374, CurrSamplesPerSec=5.728926609252932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:50:01,962] [INFO] [timer.py:197:stop] 0/3498, RunningAvgSamplesPerSec=6.328074417141918, CurrSamplesPerSec=5.706643053979031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:50:13,267] [INFO] [logging.py:68:log_dist] [Rank 0] step=1750, skipped=5, lr=[7.235555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 01:50:13,269] [INFO] [timer.py:197:stop] 0/3500, RunningAvgSamplesPerSec=6.328076788387537, CurrSamplesPerSec=5.706652274083303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.003, 'learning_rate': 7.235555555555556e-06, 'epoch': 7.42} [2022-12-17 01:50:24,905] [INFO] [timer.py:197:stop] 0/3502, RunningAvgSamplesPerSec=6.328095548443352, CurrSamplesPerSec=5.735608664701889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:50:36,205] [INFO] [timer.py:197:stop] 0/3504, RunningAvgSamplesPerSec=6.3281112245341005, CurrSamplesPerSec=5.726553426443668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:50:47,875] [INFO] [timer.py:197:stop] 0/3506, RunningAvgSamplesPerSec=6.327985124152368, CurrSamplesPerSec=5.32379925311647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:50:59,259] [INFO] [timer.py:197:stop] 0/3508, RunningAvgSamplesPerSec=6.327988019352231, CurrSamplesPerSec=5.704149382054898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:51:10,603] [INFO] [timer.py:197:stop] 0/3510, RunningAvgSamplesPerSec=6.327990555329165, CurrSamplesPerSec=5.695556609834613, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:51:22,017] [INFO] [timer.py:197:stop] 0/3512, RunningAvgSamplesPerSec=6.327956274360677, CurrSamplesPerSec=5.582028110491425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:51:33,525] [INFO] [timer.py:197:stop] 0/3514, RunningAvgSamplesPerSec=6.3279682355017295, CurrSamplesPerSec=5.71344684733673, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:51:45,006] [INFO] [timer.py:197:stop] 0/3516, RunningAvgSamplesPerSec=6.3279713819602526, CurrSamplesPerSec=5.690985291933365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:51:56,316] [INFO] [timer.py:197:stop] 0/3518, RunningAvgSamplesPerSec=6.327972735363504, CurrSamplesPerSec=5.682923170379605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:52:07,571] [INFO] [logging.py:68:log_dist] [Rank 0] step=1760, skipped=5, lr=[7.213333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 01:52:07,573] [INFO] [timer.py:197:stop] 0/3520, RunningAvgSamplesPerSec=6.328000632457452, CurrSamplesPerSec=5.740372033043068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:52:18,880] [INFO] [timer.py:197:stop] 0/3522, RunningAvgSamplesPerSec=6.328007794070124, CurrSamplesPerSec=5.695015512624356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:52:30,399] [INFO] [timer.py:197:stop] 0/3524, RunningAvgSamplesPerSec=6.327934943934382, CurrSamplesPerSec=5.490017198827353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:52:41,685] [INFO] [timer.py:197:stop] 0/3526, RunningAvgSamplesPerSec=6.327943588080988, CurrSamplesPerSec=5.716432871876821, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:52:52,954] [INFO] [timer.py:197:stop] 0/3528, RunningAvgSamplesPerSec=6.327953556698756, CurrSamplesPerSec=5.71647304423527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:53:04,283] [INFO] [timer.py:197:stop] 0/3530, RunningAvgSamplesPerSec=6.327952891375237, CurrSamplesPerSec=5.721589039463135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:53:15,575] [INFO] [timer.py:197:stop] 0/3532, RunningAvgSamplesPerSec=6.32796634313063, CurrSamplesPerSec=5.703908182437717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:53:26,987] [INFO] [timer.py:197:stop] 0/3534, RunningAvgSamplesPerSec=6.327938074298554, CurrSamplesPerSec=5.603109373402547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:53:38,263] [INFO] [timer.py:197:stop] 0/3536, RunningAvgSamplesPerSec=6.327952225669314, CurrSamplesPerSec=5.7100358791340815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:53:49,603] [INFO] [timer.py:197:stop] 0/3538, RunningAvgSamplesPerSec=6.327958608696362, CurrSamplesPerSec=5.687503713127321, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:00,963] [INFO] [logging.py:68:log_dist] [Rank 0] step=1770, skipped=5, lr=[7.191111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 01:54:00,965] [INFO] [timer.py:197:stop] 0/3540, RunningAvgSamplesPerSec=6.327936826914226, CurrSamplesPerSec=5.608218640493311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:12,321] [INFO] [timer.py:197:stop] 0/3542, RunningAvgSamplesPerSec=6.3279281870182045, CurrSamplesPerSec=5.630785943403979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:23,603] [INFO] [timer.py:197:stop] 0/3544, RunningAvgSamplesPerSec=6.327943843855838, CurrSamplesPerSec=5.717282942013153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:34,916] [INFO] [timer.py:197:stop] 0/3546, RunningAvgSamplesPerSec=6.327943092314098, CurrSamplesPerSec=5.711298140639289, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:46,198] [INFO] [timer.py:197:stop] 0/3548, RunningAvgSamplesPerSec=6.3279544527004665, CurrSamplesPerSec=5.705776012064379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:54:57,683] [INFO] [timer.py:197:stop] 0/3550, RunningAvgSamplesPerSec=6.327904100743331, CurrSamplesPerSec=5.527757973509886, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0028, 'learning_rate': 7.180000000000001e-06, 'epoch': 7.52} [2022-12-17 01:55:08,958] [INFO] [timer.py:197:stop] 0/3552, RunningAvgSamplesPerSec=6.3279175861859605, CurrSamplesPerSec=5.7165658080201425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:55:20,232] [INFO] [timer.py:197:stop] 0/3554, RunningAvgSamplesPerSec=6.327936571369671, CurrSamplesPerSec=5.722424053381545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:55:31,823] [INFO] [timer.py:197:stop] 0/3556, RunningAvgSamplesPerSec=6.327928871169333, CurrSamplesPerSec=5.661712956586682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:55:43,120] [INFO] [timer.py:197:stop] 0/3558, RunningAvgSamplesPerSec=6.327938712237207, CurrSamplesPerSec=5.704607111379388, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:55:54,456] [INFO] [logging.py:68:log_dist] [Rank 0] step=1780, skipped=5, lr=[7.1688888888888895e-06], mom=[[0.9, 0.999]] [2022-12-17 01:55:54,458] [INFO] [timer.py:197:stop] 0/3560, RunningAvgSamplesPerSec=6.327929659004718, CurrSamplesPerSec=5.653885686286007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:56:05,826] [INFO] [timer.py:197:stop] 0/3562, RunningAvgSamplesPerSec=6.327918576109895, CurrSamplesPerSec=5.729235959602013, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:56:17,168] [INFO] [timer.py:197:stop] 0/3564, RunningAvgSamplesPerSec=6.327915134381827, CurrSamplesPerSec=5.672132353009539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:56:28,474] [INFO] [timer.py:197:stop] 0/3566, RunningAvgSamplesPerSec=6.327923942699098, CurrSamplesPerSec=5.682262982816896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:56:39,759] [INFO] [timer.py:197:stop] 0/3568, RunningAvgSamplesPerSec=6.327934093570251, CurrSamplesPerSec=5.710346593953746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:56:51,044] [INFO] [timer.py:197:stop] 0/3570, RunningAvgSamplesPerSec=6.327950174366859, CurrSamplesPerSec=5.712983564444499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:02,649] [INFO] [timer.py:197:stop] 0/3572, RunningAvgSamplesPerSec=6.327926860476011, CurrSamplesPerSec=5.694614891436664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:13,930] [INFO] [timer.py:197:stop] 0/3574, RunningAvgSamplesPerSec=6.327938623697873, CurrSamplesPerSec=5.726775042307256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:25,236] [INFO] [timer.py:197:stop] 0/3576, RunningAvgSamplesPerSec=6.327943893060957, CurrSamplesPerSec=5.705478406213814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:36,872] [INFO] [timer.py:197:stop] 0/3578, RunningAvgSamplesPerSec=6.3278364995467475, CurrSamplesPerSec=5.669009956245839, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:48,211] [INFO] [logging.py:68:log_dist] [Rank 0] step=1790, skipped=5, lr=[7.146666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 01:57:48,213] [INFO] [timer.py:197:stop] 0/3580, RunningAvgSamplesPerSec=6.327832187742601, CurrSamplesPerSec=5.691824672039203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:57:59,636] [INFO] [timer.py:197:stop] 0/3582, RunningAvgSamplesPerSec=6.327800200782722, CurrSamplesPerSec=5.603391950625858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:58:10,952] [INFO] [timer.py:197:stop] 0/3584, RunningAvgSamplesPerSec=6.327805501869577, CurrSamplesPerSec=5.700271330203468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:58:22,266] [INFO] [timer.py:197:stop] 0/3586, RunningAvgSamplesPerSec=6.327811376478618, CurrSamplesPerSec=5.715144490811053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:58:33,529] [INFO] [timer.py:197:stop] 0/3588, RunningAvgSamplesPerSec=6.327829616725786, CurrSamplesPerSec=5.723332526685991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:58:44,848] [INFO] [timer.py:197:stop] 0/3590, RunningAvgSamplesPerSec=6.327828118350446, CurrSamplesPerSec=5.67359015323349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:58:56,165] [INFO] [timer.py:197:stop] 0/3592, RunningAvgSamplesPerSec=6.327827434121203, CurrSamplesPerSec=5.677497297199891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:59:07,479] [INFO] [timer.py:197:stop] 0/3594, RunningAvgSamplesPerSec=6.327833910620886, CurrSamplesPerSec=5.710358255527924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:59:18,805] [INFO] [timer.py:197:stop] 0/3596, RunningAvgSamplesPerSec=6.327836396013567, CurrSamplesPerSec=5.689810141446445, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:59:30,151] [INFO] [timer.py:197:stop] 0/3598, RunningAvgSamplesPerSec=6.327831213010918, CurrSamplesPerSec=5.687007036678088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 01:59:41,493] [INFO] [logging.py:68:log_dist] [Rank 0] step=1800, skipped=5, lr=[7.124444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 01:59:41,495] [INFO] [timer.py:197:stop] 0/3600, RunningAvgSamplesPerSec=6.327827075130297, CurrSamplesPerSec=5.684304911920582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0039, 'learning_rate': 7.124444444444445e-06, 'epoch': 7.63} [2022-12-17 01:59:52,885] [INFO] [timer.py:197:stop] 0/3602, RunningAvgSamplesPerSec=6.327801323103608, CurrSamplesPerSec=5.704719615425174, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:00:04,167] [INFO] [timer.py:197:stop] 0/3604, RunningAvgSamplesPerSec=6.327815179366836, CurrSamplesPerSec=5.712152032465933, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:00:15,433] [INFO] [timer.py:197:stop] 0/3606, RunningAvgSamplesPerSec=6.32784080816605, CurrSamplesPerSec=5.72127319793468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:00:26,682] [INFO] [timer.py:197:stop] 0/3608, RunningAvgSamplesPerSec=6.327863358791299, CurrSamplesPerSec=5.728916583436902, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:00:37,963] [INFO] [timer.py:197:stop] 0/3610, RunningAvgSamplesPerSec=6.327880859980065, CurrSamplesPerSec=5.719402277828227, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:00:49,236] [INFO] [timer.py:197:stop] 0/3612, RunningAvgSamplesPerSec=6.32789408836473, CurrSamplesPerSec=5.702605087742875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:01,131] [INFO] [timer.py:197:stop] 0/3614, RunningAvgSamplesPerSec=6.327892039894871, CurrSamplesPerSec=5.668299135170656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:12,791] [INFO] [timer.py:197:stop] 0/3616, RunningAvgSamplesPerSec=6.327888579025662, CurrSamplesPerSec=5.682090502814677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:24,764] [INFO] [timer.py:197:stop] 0/3618, RunningAvgSamplesPerSec=6.327884018322994, CurrSamplesPerSec=5.668504534331354, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:36,124] [INFO] [logging.py:68:log_dist] [Rank 0] step=1810, skipped=5, lr=[7.102222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 02:01:36,125] [INFO] [timer.py:197:stop] 0/3620, RunningAvgSamplesPerSec=6.3278921259197904, CurrSamplesPerSec=5.714443464157503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:47,415] [INFO] [timer.py:197:stop] 0/3622, RunningAvgSamplesPerSec=6.3279065726852615, CurrSamplesPerSec=5.725988835540606, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:01:58,700] [INFO] [timer.py:197:stop] 0/3624, RunningAvgSamplesPerSec=6.327922066863667, CurrSamplesPerSec=5.720485819054075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:02:10,061] [INFO] [timer.py:197:stop] 0/3626, RunningAvgSamplesPerSec=6.327913184941603, CurrSamplesPerSec=5.708340783251031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:02:21,389] [INFO] [timer.py:197:stop] 0/3628, RunningAvgSamplesPerSec=6.327922140001181, CurrSamplesPerSec=5.695893065117951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:02:32,688] [INFO] [timer.py:197:stop] 0/3630, RunningAvgSamplesPerSec=6.327936577561977, CurrSamplesPerSec=5.712554395636458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:02:44,121] [INFO] [timer.py:197:stop] 0/3632, RunningAvgSamplesPerSec=6.327962876653724, CurrSamplesPerSec=5.729724140830995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:02:55,432] [INFO] [timer.py:197:stop] 0/3634, RunningAvgSamplesPerSec=6.327980036999522, CurrSamplesPerSec=5.727084404708527, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:03:06,839] [INFO] [timer.py:197:stop] 0/3636, RunningAvgSamplesPerSec=6.327956709513156, CurrSamplesPerSec=5.602805073350317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:03:18,401] [INFO] [timer.py:197:stop] 0/3638, RunningAvgSamplesPerSec=6.32797987065095, CurrSamplesPerSec=5.736248700806572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:03:29,754] [INFO] [logging.py:68:log_dist] [Rank 0] step=1820, skipped=5, lr=[7.08e-06], mom=[[0.9, 0.999]] [2022-12-17 02:03:29,755] [INFO] [timer.py:197:stop] 0/3640, RunningAvgSamplesPerSec=6.327993811918197, CurrSamplesPerSec=5.707425413016101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:03:41,156] [INFO] [timer.py:197:stop] 0/3642, RunningAvgSamplesPerSec=6.327964199154306, CurrSamplesPerSec=5.58800687342309, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:03:52,578] [INFO] [timer.py:197:stop] 0/3644, RunningAvgSamplesPerSec=6.3279903684733325, CurrSamplesPerSec=5.734441965307107, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:04:03,840] [INFO] [timer.py:197:stop] 0/3646, RunningAvgSamplesPerSec=6.328012406815716, CurrSamplesPerSec=5.729712644625185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:04:15,208] [INFO] [timer.py:197:stop] 0/3648, RunningAvgSamplesPerSec=6.327998777121916, CurrSamplesPerSec=5.6463310889257095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:04:26,567] [INFO] [timer.py:197:stop] 0/3650, RunningAvgSamplesPerSec=6.328016363517222, CurrSamplesPerSec=5.729583498930007, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0031, 'learning_rate': 7.06888888888889e-06, 'epoch': 7.73} [2022-12-17 02:04:37,835] [INFO] [timer.py:197:stop] 0/3652, RunningAvgSamplesPerSec=6.3280379957539425, CurrSamplesPerSec=5.750655218175977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:04:49,140] [INFO] [timer.py:197:stop] 0/3654, RunningAvgSamplesPerSec=6.328047093226549, CurrSamplesPerSec=5.692673716223513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:00,555] [INFO] [timer.py:197:stop] 0/3656, RunningAvgSamplesPerSec=6.328070328408502, CurrSamplesPerSec=5.729652962936433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:11,824] [INFO] [timer.py:197:stop] 0/3658, RunningAvgSamplesPerSec=6.328090833625429, CurrSamplesPerSec=5.722410634637678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:23,067] [INFO] [logging.py:68:log_dist] [Rank 0] step=1830, skipped=5, lr=[7.057777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 02:05:23,068] [INFO] [timer.py:197:stop] 0/3660, RunningAvgSamplesPerSec=6.328114198755333, CurrSamplesPerSec=5.753315006315808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:34,318] [INFO] [timer.py:197:stop] 0/3662, RunningAvgSamplesPerSec=6.328140992662758, CurrSamplesPerSec=5.763149688904019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:45,814] [INFO] [timer.py:197:stop] 0/3664, RunningAvgSamplesPerSec=6.328166430047528, CurrSamplesPerSec=5.744112367045322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:05:57,107] [INFO] [timer.py:197:stop] 0/3666, RunningAvgSamplesPerSec=6.328179424638803, CurrSamplesPerSec=5.704464790420485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:06:08,364] [INFO] [timer.py:197:stop] 0/3668, RunningAvgSamplesPerSec=6.328198479294168, CurrSamplesPerSec=5.732478199279469, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:06:19,630] [INFO] [timer.py:197:stop] 0/3670, RunningAvgSamplesPerSec=6.3282200623513365, CurrSamplesPerSec=5.71588853121749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:06:30,920] [INFO] [timer.py:197:stop] 0/3672, RunningAvgSamplesPerSec=6.328233558552581, CurrSamplesPerSec=5.725239477060349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:06:42,190] [INFO] [timer.py:197:stop] 0/3674, RunningAvgSamplesPerSec=6.3282541228592954, CurrSamplesPerSec=5.716858727606408, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:06:53,472] [INFO] [timer.py:197:stop] 0/3676, RunningAvgSamplesPerSec=6.328270568160537, CurrSamplesPerSec=5.739481949746398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:07:04,744] [INFO] [timer.py:197:stop] 0/3678, RunningAvgSamplesPerSec=6.328284690470813, CurrSamplesPerSec=5.716911568333583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:07:16,014] [INFO] [logging.py:68:log_dist] [Rank 0] step=1840, skipped=5, lr=[7.035555555555557e-06], mom=[[0.9, 0.999]] [2022-12-17 02:07:16,016] [INFO] [timer.py:197:stop] 0/3680, RunningAvgSamplesPerSec=6.328304220828954, CurrSamplesPerSec=5.7337428103257135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:07:27,284] [INFO] [timer.py:197:stop] 0/3682, RunningAvgSamplesPerSec=6.328324947917273, CurrSamplesPerSec=5.739207076337193, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:07:38,555] [INFO] [timer.py:197:stop] 0/3684, RunningAvgSamplesPerSec=6.3283337295802875, CurrSamplesPerSec=5.710208845213026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:07:49,871] [INFO] [timer.py:197:stop] 0/3686, RunningAvgSamplesPerSec=6.328337637182703, CurrSamplesPerSec=5.692891027263141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:01,209] [INFO] [timer.py:197:stop] 0/3688, RunningAvgSamplesPerSec=6.328334295974981, CurrSamplesPerSec=5.6979128737779705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:12,502] [INFO] [timer.py:197:stop] 0/3690, RunningAvgSamplesPerSec=6.328341546989031, CurrSamplesPerSec=5.702536278019622, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:23,789] [INFO] [timer.py:197:stop] 0/3692, RunningAvgSamplesPerSec=6.328349056114674, CurrSamplesPerSec=5.704499945703904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:35,047] [INFO] [timer.py:197:stop] 0/3694, RunningAvgSamplesPerSec=6.32836730817153, CurrSamplesPerSec=5.725636847106058, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:46,582] [INFO] [timer.py:197:stop] 0/3696, RunningAvgSamplesPerSec=6.328376371606257, CurrSamplesPerSec=5.706006938727323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:08:57,904] [INFO] [timer.py:197:stop] 0/3698, RunningAvgSamplesPerSec=6.328378093780299, CurrSamplesPerSec=5.7031231522802575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:09:09,196] [INFO] [logging.py:68:log_dist] [Rank 0] step=1850, skipped=5, lr=[7.0133333333333345e-06], mom=[[0.9, 0.999]] [2022-12-17 02:09:09,198] [INFO] [timer.py:197:stop] 0/3700, RunningAvgSamplesPerSec=6.328390288445043, CurrSamplesPerSec=5.716338164733072, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0027, 'learning_rate': 7.0133333333333345e-06, 'epoch': 7.84} [2022-12-17 02:09:20,481] [INFO] [timer.py:197:stop] 0/3702, RunningAvgSamplesPerSec=6.328405384962958, CurrSamplesPerSec=5.730546116797611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:09:31,754] [INFO] [timer.py:197:stop] 0/3704, RunningAvgSamplesPerSec=6.328418593830659, CurrSamplesPerSec=5.720126705806713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:09:43,071] [INFO] [timer.py:197:stop] 0/3706, RunningAvgSamplesPerSec=6.3284230163254644, CurrSamplesPerSec=5.692320500457679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:09:54,375] [INFO] [timer.py:197:stop] 0/3708, RunningAvgSamplesPerSec=6.32843249908911, CurrSamplesPerSec=5.7056054973347035, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:10:05,694] [INFO] [timer.py:197:stop] 0/3710, RunningAvgSamplesPerSec=6.328436656248839, CurrSamplesPerSec=5.7148286303710005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:10:16,980] [INFO] [timer.py:197:stop] 0/3712, RunningAvgSamplesPerSec=6.328446140698633, CurrSamplesPerSec=5.71491842073463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:10:28,279] [INFO] [timer.py:197:stop] 0/3714, RunningAvgSamplesPerSec=6.328459971048828, CurrSamplesPerSec=5.726909192988373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:10:39,581] [INFO] [timer.py:197:stop] 0/3716, RunningAvgSamplesPerSec=6.328468354521305, CurrSamplesPerSec=5.703816313777253, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:10:50,907] [INFO] [timer.py:197:stop] 0/3718, RunningAvgSamplesPerSec=6.328463095854033, CurrSamplesPerSec=5.686705361943823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:02,197] [INFO] [logging.py:68:log_dist] [Rank 0] step=1860, skipped=5, lr=[6.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 02:11:02,198] [INFO] [timer.py:197:stop] 0/3720, RunningAvgSamplesPerSec=6.328475951506111, CurrSamplesPerSec=5.72845078847984, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:13,481] [INFO] [timer.py:197:stop] 0/3722, RunningAvgSamplesPerSec=6.328491494415783, CurrSamplesPerSec=5.727654589014197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:24,776] [INFO] [timer.py:197:stop] 0/3724, RunningAvgSamplesPerSec=6.328503029556852, CurrSamplesPerSec=5.7198580707958415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:36,078] [INFO] [timer.py:197:stop] 0/3726, RunningAvgSamplesPerSec=6.328512845945567, CurrSamplesPerSec=5.716436036951238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:47,389] [INFO] [timer.py:197:stop] 0/3728, RunningAvgSamplesPerSec=6.328519648234641, CurrSamplesPerSec=5.696660631635473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:11:58,710] [INFO] [timer.py:197:stop] 0/3730, RunningAvgSamplesPerSec=6.3285224714645985, CurrSamplesPerSec=5.706504755990071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:12:10,021] [INFO] [timer.py:197:stop] 0/3732, RunningAvgSamplesPerSec=6.328523854481922, CurrSamplesPerSec=5.680343440173591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:12:21,373] [INFO] [timer.py:197:stop] 0/3734, RunningAvgSamplesPerSec=6.328516952220104, CurrSamplesPerSec=5.674356038258372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:12:32,655] [INFO] [timer.py:197:stop] 0/3736, RunningAvgSamplesPerSec=6.328527993386, CurrSamplesPerSec=5.715351595458335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:12:43,985] [INFO] [timer.py:197:stop] 0/3738, RunningAvgSamplesPerSec=6.328523357191348, CurrSamplesPerSec=5.673340020114448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:12:55,366] [INFO] [logging.py:68:log_dist] [Rank 0] step=1870, skipped=5, lr=[6.96888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 02:12:55,367] [INFO] [timer.py:197:stop] 0/3740, RunningAvgSamplesPerSec=6.328530895199716, CurrSamplesPerSec=5.704396420696285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:13:06,720] [INFO] [timer.py:197:stop] 0/3742, RunningAvgSamplesPerSec=6.32852430551046, CurrSamplesPerSec=5.666163435402976, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:13:18,043] [INFO] [timer.py:197:stop] 0/3744, RunningAvgSamplesPerSec=6.328526981645077, CurrSamplesPerSec=5.691222021288974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:13:29,477] [INFO] [timer.py:197:stop] 0/3746, RunningAvgSamplesPerSec=6.32854021618564, CurrSamplesPerSec=5.715457222174339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:13:40,814] [INFO] [timer.py:197:stop] 0/3748, RunningAvgSamplesPerSec=6.32853715357103, CurrSamplesPerSec=5.673013417356105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:13:52,112] [INFO] [timer.py:197:stop] 0/3750, RunningAvgSamplesPerSec=6.3285428126775445, CurrSamplesPerSec=5.69774863376101, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0034, 'learning_rate': 6.9577777777777785e-06, 'epoch': 7.94} [2022-12-17 02:14:03,395] [INFO] [timer.py:197:stop] 0/3752, RunningAvgSamplesPerSec=6.32855860428077, CurrSamplesPerSec=5.7144400579917765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:14:14,669] [INFO] [timer.py:197:stop] 0/3754, RunningAvgSamplesPerSec=6.32857254990377, CurrSamplesPerSec=5.707274214632692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:14:25,976] [INFO] [timer.py:197:stop] 0/3756, RunningAvgSamplesPerSec=6.32858012388723, CurrSamplesPerSec=5.7056343604201665, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:14:37,275] [INFO] [timer.py:197:stop] 0/3758, RunningAvgSamplesPerSec=6.328582928084006, CurrSamplesPerSec=5.690336016171647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:14:48,581] [INFO] [logging.py:68:log_dist] [Rank 0] step=1880, skipped=5, lr=[6.946666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 02:14:48,583] [INFO] [timer.py:197:stop] 0/3760, RunningAvgSamplesPerSec=6.328590303890887, CurrSamplesPerSec=5.710872140930814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:14:59,882] [INFO] [timer.py:197:stop] 0/3762, RunningAvgSamplesPerSec=6.328600848606018, CurrSamplesPerSec=5.727032841987243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:15:11,192] [INFO] [timer.py:197:stop] 0/3764, RunningAvgSamplesPerSec=6.328607403088262, CurrSamplesPerSec=5.707744824701184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:15:22,491] [INFO] [timer.py:197:stop] 0/3766, RunningAvgSamplesPerSec=6.328617861925704, CurrSamplesPerSec=5.7186139518732695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:15:33,866] [INFO] [timer.py:197:stop] 0/3768, RunningAvgSamplesPerSec=6.328603286767437, CurrSamplesPerSec=5.653605852250859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:15:45,131] [INFO] [timer.py:197:stop] 0/3770, RunningAvgSamplesPerSec=6.3286202165529755, CurrSamplesPerSec=5.713061624172338, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:15:56,520] [INFO] [timer.py:197:stop] 0/3772, RunningAvgSamplesPerSec=6.328601493207364, CurrSamplesPerSec=5.618634523949762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:16:07,950] [INFO] [timer.py:197:stop] 0/3774, RunningAvgSamplesPerSec=6.328591767717235, CurrSamplesPerSec=5.6332066490927115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:16:16,490] [INFO] [timer.py:197:stop] 0/3776, RunningAvgSamplesPerSec=6.329405434285961, CurrSamplesPerSec=10.041965532176695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:16:27,738] [INFO] [timer.py:197:stop] 0/3778, RunningAvgSamplesPerSec=6.329430791527427, CurrSamplesPerSec=5.735725581354602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:16:39,053] [INFO] [logging.py:68:log_dist] [Rank 0] step=1890, skipped=5, lr=[6.924444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 02:16:39,055] [INFO] [timer.py:197:stop] 0/3780, RunningAvgSamplesPerSec=6.329434823557692, CurrSamplesPerSec=5.699065722949043, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:16:50,358] [INFO] [timer.py:197:stop] 0/3782, RunningAvgSamplesPerSec=6.329442138280489, CurrSamplesPerSec=5.71940057178781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:01,661] [INFO] [timer.py:197:stop] 0/3784, RunningAvgSamplesPerSec=6.329449934445086, CurrSamplesPerSec=5.722445279523075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:12,916] [INFO] [timer.py:197:stop] 0/3786, RunningAvgSamplesPerSec=6.329473072416305, CurrSamplesPerSec=5.758911295903981, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:24,186] [INFO] [timer.py:197:stop] 0/3788, RunningAvgSamplesPerSec=6.329486457479711, CurrSamplesPerSec=5.717976383190484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:35,494] [INFO] [timer.py:197:stop] 0/3790, RunningAvgSamplesPerSec=6.329492173183151, CurrSamplesPerSec=5.6981175218636375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:46,888] [INFO] [timer.py:197:stop] 0/3792, RunningAvgSamplesPerSec=6.329469265360669, CurrSamplesPerSec=5.614493217525514, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:17:58,160] [INFO] [timer.py:197:stop] 0/3794, RunningAvgSamplesPerSec=6.3294846704318015, CurrSamplesPerSec=5.712998884409758, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:18:09,443] [INFO] [timer.py:197:stop] 0/3796, RunningAvgSamplesPerSec=6.32949887820273, CurrSamplesPerSec=5.709194033786988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:18:20,683] [INFO] [timer.py:197:stop] 0/3798, RunningAvgSamplesPerSec=6.329520651840991, CurrSamplesPerSec=5.732598171337711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:18:32,056] [INFO] [logging.py:68:log_dist] [Rank 0] step=1900, skipped=5, lr=[6.902222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 02:18:32,058] [INFO] [timer.py:197:stop] 0/3800, RunningAvgSamplesPerSec=6.329503954260459, CurrSamplesPerSec=5.640483877827287, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0033, 'learning_rate': 6.902222222222223e-06, 'epoch': 8.05} [2022-12-17 02:18:43,327] [INFO] [timer.py:197:stop] 0/3802, RunningAvgSamplesPerSec=6.32952213607384, CurrSamplesPerSec=5.727358851306722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:18:54,642] [INFO] [timer.py:197:stop] 0/3804, RunningAvgSamplesPerSec=6.329532860172703, CurrSamplesPerSec=5.715511497327968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:19:05,943] [INFO] [timer.py:197:stop] 0/3806, RunningAvgSamplesPerSec=6.329545981695896, CurrSamplesPerSec=5.713660152961128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:19:17,382] [INFO] [timer.py:197:stop] 0/3808, RunningAvgSamplesPerSec=6.329550794690124, CurrSamplesPerSec=5.6937915961488414, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:19:28,637] [INFO] [timer.py:197:stop] 0/3810, RunningAvgSamplesPerSec=6.329564877775189, CurrSamplesPerSec=5.722360863846393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:19:39,942] [INFO] [timer.py:197:stop] 0/3812, RunningAvgSamplesPerSec=6.329571525169608, CurrSamplesPerSec=5.707313772986497, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:19:51,252] [INFO] [timer.py:197:stop] 0/3814, RunningAvgSamplesPerSec=6.3295763320249305, CurrSamplesPerSec=5.712068406390203, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:02,584] [INFO] [timer.py:197:stop] 0/3816, RunningAvgSamplesPerSec=6.329574370163669, CurrSamplesPerSec=5.693129605597974, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:13,911] [INFO] [timer.py:197:stop] 0/3818, RunningAvgSamplesPerSec=6.329574490134868, CurrSamplesPerSec=5.686367822968028, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:25,218] [INFO] [logging.py:68:log_dist] [Rank 0] step=1910, skipped=5, lr=[6.88e-06], mom=[[0.9, 0.999]] [2022-12-17 02:20:25,219] [INFO] [timer.py:197:stop] 0/3820, RunningAvgSamplesPerSec=6.329570251685095, CurrSamplesPerSec=5.675956596931117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:36,536] [INFO] [timer.py:197:stop] 0/3822, RunningAvgSamplesPerSec=6.3295741225492925, CurrSamplesPerSec=5.706256321703277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:47,844] [INFO] [timer.py:197:stop] 0/3824, RunningAvgSamplesPerSec=6.3295797971395995, CurrSamplesPerSec=5.717985883552623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:20:59,133] [INFO] [timer.py:197:stop] 0/3826, RunningAvgSamplesPerSec=6.3295918030274505, CurrSamplesPerSec=5.710409275474985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:21:10,404] [INFO] [timer.py:197:stop] 0/3828, RunningAvgSamplesPerSec=6.329599897041199, CurrSamplesPerSec=5.706553766089495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:21:21,731] [INFO] [timer.py:197:stop] 0/3830, RunningAvgSamplesPerSec=6.329599217752504, CurrSamplesPerSec=5.688717211971074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:21:33,027] [INFO] [timer.py:197:stop] 0/3832, RunningAvgSamplesPerSec=6.329603904750574, CurrSamplesPerSec=5.703930968262709, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:21:44,338] [INFO] [timer.py:197:stop] 0/3834, RunningAvgSamplesPerSec=6.329608718741985, CurrSamplesPerSec=5.703741172745922, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:21:55,646] [INFO] [timer.py:197:stop] 0/3836, RunningAvgSamplesPerSec=6.3296147038120765, CurrSamplesPerSec=5.7027053978494635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:22:06,956] [INFO] [timer.py:197:stop] 0/3838, RunningAvgSamplesPerSec=6.329619610197565, CurrSamplesPerSec=5.684683858968662, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:22:18,241] [INFO] [logging.py:68:log_dist] [Rank 0] step=1920, skipped=5, lr=[6.857777777777779e-06], mom=[[0.9, 0.999]] [2022-12-17 02:22:18,242] [INFO] [timer.py:197:stop] 0/3840, RunningAvgSamplesPerSec=6.329632723924941, CurrSamplesPerSec=5.705162885752711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:22:29,542] [INFO] [timer.py:197:stop] 0/3842, RunningAvgSamplesPerSec=6.329637525436198, CurrSamplesPerSec=5.709632413326189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:22:40,818] [INFO] [timer.py:197:stop] 0/3844, RunningAvgSamplesPerSec=6.329654567588082, CurrSamplesPerSec=5.723388903993835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:22:52,114] [INFO] [timer.py:197:stop] 0/3846, RunningAvgSamplesPerSec=6.329664886989055, CurrSamplesPerSec=5.725275377311717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:23:03,448] [INFO] [timer.py:197:stop] 0/3848, RunningAvgSamplesPerSec=6.329665610995146, CurrSamplesPerSec=5.708845564011121, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:23:14,755] [INFO] [timer.py:197:stop] 0/3850, RunningAvgSamplesPerSec=6.329672720076725, CurrSamplesPerSec=5.696633793536779, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0032, 'learning_rate': 6.846666666666667e-06, 'epoch': 8.16} [2022-12-17 02:23:26,091] [INFO] [timer.py:197:stop] 0/3852, RunningAvgSamplesPerSec=6.32967063641164, CurrSamplesPerSec=5.6939845950122026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:23:37,356] [INFO] [timer.py:197:stop] 0/3854, RunningAvgSamplesPerSec=6.329685549843888, CurrSamplesPerSec=5.725235813794708, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:23:48,671] [INFO] [timer.py:197:stop] 0/3856, RunningAvgSamplesPerSec=6.329690025034303, CurrSamplesPerSec=5.696226659165504, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:00,016] [INFO] [timer.py:197:stop] 0/3858, RunningAvgSamplesPerSec=6.329685215773519, CurrSamplesPerSec=5.693312174994899, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:11,426] [INFO] [logging.py:68:log_dist] [Rank 0] step=1930, skipped=5, lr=[6.835555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 02:24:11,428] [INFO] [timer.py:197:stop] 0/3860, RunningAvgSamplesPerSec=6.329690796188916, CurrSamplesPerSec=5.701824776245682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:22,723] [INFO] [timer.py:197:stop] 0/3862, RunningAvgSamplesPerSec=6.3296961670571354, CurrSamplesPerSec=5.69953982082623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:34,021] [INFO] [timer.py:197:stop] 0/3864, RunningAvgSamplesPerSec=6.329705244869385, CurrSamplesPerSec=5.705117052093761, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:45,327] [INFO] [timer.py:197:stop] 0/3866, RunningAvgSamplesPerSec=6.329712194155368, CurrSamplesPerSec=5.705711976888372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:24:56,636] [INFO] [timer.py:197:stop] 0/3868, RunningAvgSamplesPerSec=6.329722996438025, CurrSamplesPerSec=5.696094909021571, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:25:07,963] [INFO] [timer.py:197:stop] 0/3870, RunningAvgSamplesPerSec=6.329723610970681, CurrSamplesPerSec=5.699059915194582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:25:19,267] [INFO] [timer.py:197:stop] 0/3872, RunningAvgSamplesPerSec=6.329730045520958, CurrSamplesPerSec=5.72029077571516, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:25:30,596] [INFO] [timer.py:197:stop] 0/3874, RunningAvgSamplesPerSec=6.329724069035336, CurrSamplesPerSec=5.675370020557602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:25:41,918] [INFO] [timer.py:197:stop] 0/3876, RunningAvgSamplesPerSec=6.329725765685623, CurrSamplesPerSec=5.6958365030451725, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:25:53,221] [INFO] [timer.py:197:stop] 0/3878, RunningAvgSamplesPerSec=6.329733146930267, CurrSamplesPerSec=5.705232486420832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:26:04,538] [INFO] [logging.py:68:log_dist] [Rank 0] step=1940, skipped=5, lr=[6.813333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 02:26:04,539] [INFO] [timer.py:197:stop] 0/3880, RunningAvgSamplesPerSec=6.329736432320369, CurrSamplesPerSec=5.710810421201311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:26:15,836] [INFO] [timer.py:197:stop] 0/3882, RunningAvgSamplesPerSec=6.32974185638214, CurrSamplesPerSec=5.701551560238876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:26:27,147] [INFO] [timer.py:197:stop] 0/3884, RunningAvgSamplesPerSec=6.32974730470015, CurrSamplesPerSec=5.702902636224327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:26:38,434] [INFO] [timer.py:197:stop] 0/3886, RunningAvgSamplesPerSec=6.329757043427764, CurrSamplesPerSec=5.710765711679313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:26:49,740] [INFO] [timer.py:197:stop] 0/3888, RunningAvgSamplesPerSec=6.3297655812523566, CurrSamplesPerSec=5.7131039378135435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:01,009] [INFO] [timer.py:197:stop] 0/3890, RunningAvgSamplesPerSec=6.329781514470751, CurrSamplesPerSec=5.732171681234439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:12,305] [INFO] [timer.py:197:stop] 0/3892, RunningAvgSamplesPerSec=6.329792827236585, CurrSamplesPerSec=5.713371452351093, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:23,746] [INFO] [timer.py:197:stop] 0/3894, RunningAvgSamplesPerSec=6.329804099269943, CurrSamplesPerSec=5.718544024331364, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:35,062] [INFO] [timer.py:197:stop] 0/3896, RunningAvgSamplesPerSec=6.32980797636874, CurrSamplesPerSec=5.699682864311109, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:46,340] [INFO] [timer.py:197:stop] 0/3898, RunningAvgSamplesPerSec=6.329824368151498, CurrSamplesPerSec=5.735463077042991, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:27:57,678] [INFO] [logging.py:68:log_dist] [Rank 0] step=1950, skipped=5, lr=[6.7911111111111115e-06], mom=[[0.9, 0.999]] [2022-12-17 02:27:57,679] [INFO] [timer.py:197:stop] 0/3900, RunningAvgSamplesPerSec=6.32982140713299, CurrSamplesPerSec=5.697316672085901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0033, 'learning_rate': 6.7911111111111115e-06, 'epoch': 8.26} [2022-12-17 02:28:09,037] [INFO] [timer.py:197:stop] 0/3902, RunningAvgSamplesPerSec=6.329814432386257, CurrSamplesPerSec=5.684121475399644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:28:20,330] [INFO] [timer.py:197:stop] 0/3904, RunningAvgSamplesPerSec=6.329825137878348, CurrSamplesPerSec=5.714867563515535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:28:31,665] [INFO] [timer.py:197:stop] 0/3906, RunningAvgSamplesPerSec=6.329816776466052, CurrSamplesPerSec=5.673013657138756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:28:42,966] [INFO] [timer.py:197:stop] 0/3908, RunningAvgSamplesPerSec=6.329824875290936, CurrSamplesPerSec=5.693127915195306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:28:54,283] [INFO] [timer.py:197:stop] 0/3910, RunningAvgSamplesPerSec=6.329828643103604, CurrSamplesPerSec=5.70661806279359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:29:05,642] [INFO] [timer.py:197:stop] 0/3912, RunningAvgSamplesPerSec=6.329813334548276, CurrSamplesPerSec=5.662994082657486, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:29:16,978] [INFO] [timer.py:197:stop] 0/3914, RunningAvgSamplesPerSec=6.329811090289342, CurrSamplesPerSec=5.696075086591611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:29:28,257] [INFO] [timer.py:197:stop] 0/3916, RunningAvgSamplesPerSec=6.329826773522238, CurrSamplesPerSec=5.717712090899444, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:29:39,602] [INFO] [timer.py:197:stop] 0/3918, RunningAvgSamplesPerSec=6.329820899907599, CurrSamplesPerSec=5.685816185999066, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:29:50,928] [INFO] [logging.py:68:log_dist] [Rank 0] step=1960, skipped=5, lr=[6.768888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 02:29:50,929] [INFO] [timer.py:197:stop] 0/3920, RunningAvgSamplesPerSec=6.329822966762694, CurrSamplesPerSec=5.708094131136135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:02,254] [INFO] [timer.py:197:stop] 0/3922, RunningAvgSamplesPerSec=6.329823766455425, CurrSamplesPerSec=5.702115460318731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:13,585] [INFO] [timer.py:197:stop] 0/3924, RunningAvgSamplesPerSec=6.329823133167677, CurrSamplesPerSec=5.6947779843335375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:24,876] [INFO] [timer.py:197:stop] 0/3926, RunningAvgSamplesPerSec=6.329829857024279, CurrSamplesPerSec=5.710925114045982, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:36,201] [INFO] [timer.py:197:stop] 0/3928, RunningAvgSamplesPerSec=6.329830280622669, CurrSamplesPerSec=5.70457171231726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:47,521] [INFO] [timer.py:197:stop] 0/3930, RunningAvgSamplesPerSec=6.329826977832305, CurrSamplesPerSec=5.68627989112763, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:30:58,834] [INFO] [timer.py:197:stop] 0/3932, RunningAvgSamplesPerSec=6.329825438915529, CurrSamplesPerSec=5.691355959464071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:31:10,157] [INFO] [timer.py:197:stop] 0/3934, RunningAvgSamplesPerSec=6.3298256067430785, CurrSamplesPerSec=5.703385855202171, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:31:21,457] [INFO] [timer.py:197:stop] 0/3936, RunningAvgSamplesPerSec=6.329834354702879, CurrSamplesPerSec=5.703952057433165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:31:32,788] [INFO] [timer.py:197:stop] 0/3938, RunningAvgSamplesPerSec=6.329832611573973, CurrSamplesPerSec=5.681675823878106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:31:44,075] [INFO] [logging.py:68:log_dist] [Rank 0] step=1970, skipped=5, lr=[6.746666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 02:31:44,076] [INFO] [timer.py:197:stop] 0/3940, RunningAvgSamplesPerSec=6.329846029232881, CurrSamplesPerSec=5.723946147254611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:31:55,411] [INFO] [timer.py:197:stop] 0/3942, RunningAvgSamplesPerSec=6.32984408328006, CurrSamplesPerSec=5.6886128123952995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:32:06,722] [INFO] [timer.py:197:stop] 0/3944, RunningAvgSamplesPerSec=6.329853290493203, CurrSamplesPerSec=5.7044201802018675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:32:18,031] [INFO] [timer.py:197:stop] 0/3946, RunningAvgSamplesPerSec=6.329859515800882, CurrSamplesPerSec=5.689871890505715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:32:29,345] [INFO] [timer.py:197:stop] 0/3948, RunningAvgSamplesPerSec=6.329865167645183, CurrSamplesPerSec=5.704709674156191, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:32:40,666] [INFO] [timer.py:197:stop] 0/3950, RunningAvgSamplesPerSec=6.3298671032413685, CurrSamplesPerSec=5.698391134644368, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0035, 'learning_rate': 6.735555555555556e-06, 'epoch': 8.37} [2022-12-17 02:32:51,918] [INFO] [timer.py:197:stop] 0/3952, RunningAvgSamplesPerSec=6.3298891298894295, CurrSamplesPerSec=5.735413324032536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:33:03,196] [INFO] [timer.py:197:stop] 0/3954, RunningAvgSamplesPerSec=6.329902086633342, CurrSamplesPerSec=5.721910526340695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:33:14,499] [INFO] [timer.py:197:stop] 0/3956, RunningAvgSamplesPerSec=6.329905822912193, CurrSamplesPerSec=5.691245912501485, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:33:25,826] [INFO] [timer.py:197:stop] 0/3958, RunningAvgSamplesPerSec=6.329906249651454, CurrSamplesPerSec=5.686801981118413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:33:37,181] [INFO] [logging.py:68:log_dist] [Rank 0] step=1980, skipped=5, lr=[6.724444444444444e-06], mom=[[0.9, 0.999]] [2022-12-17 02:33:37,182] [INFO] [timer.py:197:stop] 0/3960, RunningAvgSamplesPerSec=6.329902215016532, CurrSamplesPerSec=5.65284436763619, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:33:48,619] [INFO] [timer.py:197:stop] 0/3962, RunningAvgSamplesPerSec=6.329875482170933, CurrSamplesPerSec=5.621293394307236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:00,103] [INFO] [timer.py:197:stop] 0/3964, RunningAvgSamplesPerSec=6.329851487880202, CurrSamplesPerSec=5.641696364327095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:11,445] [INFO] [timer.py:197:stop] 0/3966, RunningAvgSamplesPerSec=6.329844461992682, CurrSamplesPerSec=5.665015728261031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:22,785] [INFO] [timer.py:197:stop] 0/3968, RunningAvgSamplesPerSec=6.329850249763681, CurrSamplesPerSec=5.709415765273105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:34,214] [INFO] [timer.py:197:stop] 0/3970, RunningAvgSamplesPerSec=6.329839127590746, CurrSamplesPerSec=5.679090016835304, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:45,611] [INFO] [timer.py:197:stop] 0/3972, RunningAvgSamplesPerSec=6.329838474291728, CurrSamplesPerSec=5.683097385387977, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:34:57,039] [INFO] [timer.py:197:stop] 0/3974, RunningAvgSamplesPerSec=6.329831710087475, CurrSamplesPerSec=5.673985662776667, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:35:08,398] [INFO] [timer.py:197:stop] 0/3976, RunningAvgSamplesPerSec=6.329831826261857, CurrSamplesPerSec=5.678151333205712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:35:19,776] [INFO] [timer.py:197:stop] 0/3978, RunningAvgSamplesPerSec=6.329833221018402, CurrSamplesPerSec=5.700546360450433, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:35:31,153] [INFO] [logging.py:68:log_dist] [Rank 0] step=1990, skipped=5, lr=[6.702222222222224e-06], mom=[[0.9, 0.999]] [2022-12-17 02:35:31,155] [INFO] [timer.py:197:stop] 0/3980, RunningAvgSamplesPerSec=6.329824364419316, CurrSamplesPerSec=5.666521307084508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:35:42,508] [INFO] [timer.py:197:stop] 0/3982, RunningAvgSamplesPerSec=6.329817725488676, CurrSamplesPerSec=5.68016122078727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:35:53,948] [INFO] [timer.py:197:stop] 0/3984, RunningAvgSamplesPerSec=6.329809411355801, CurrSamplesPerSec=5.673447457141223, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:36:05,340] [INFO] [timer.py:197:stop] 0/3986, RunningAvgSamplesPerSec=6.329800045419625, CurrSamplesPerSec=5.673120602221541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:36:16,732] [INFO] [timer.py:197:stop] 0/3988, RunningAvgSamplesPerSec=6.32980055398527, CurrSamplesPerSec=5.690515269998576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:36:28,053] [INFO] [timer.py:197:stop] 0/3990, RunningAvgSamplesPerSec=6.329802810481422, CurrSamplesPerSec=5.70927976144318, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:36:39,397] [INFO] [timer.py:197:stop] 0/3992, RunningAvgSamplesPerSec=6.329802819995766, CurrSamplesPerSec=5.675330423820994, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:36:50,710] [INFO] [timer.py:197:stop] 0/3994, RunningAvgSamplesPerSec=6.329812884980746, CurrSamplesPerSec=5.71857594226552, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:37:02,127] [INFO] [timer.py:197:stop] 0/3996, RunningAvgSamplesPerSec=6.3297965344729885, CurrSamplesPerSec=5.635572645721929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:37:13,514] [INFO] [timer.py:197:stop] 0/3998, RunningAvgSamplesPerSec=6.329792063006823, CurrSamplesPerSec=5.678675775718042, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 02:37:24,866] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=5, lr=[6.680000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 02:37:24,868] [INFO] [timer.py:197:stop] 0/4000, RunningAvgSamplesPerSec=6.3297894302029585, CurrSamplesPerSec=5.6972066363266105, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0034, 'learning_rate': 6.680000000000001e-06, 'epoch': 8.47} {'eval_loss': 0.18408203125, 'eval_wer': 9.830389863906742, 'eval_runtime': 2143.7282, 'eval_samples_per_second': 3.598, 'eval_steps_per_second': 0.45, 'epoch': 8.47} [2022-12-17 03:13:12,165] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step2000 is begin to save! [2022-12-17 03:13:12,175] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-2000/global_step2000/mp_rank_00_model_states.pt [2022-12-17 03:13:12,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-2000/global_step2000/mp_rank_00_model_states.pt... [2022-12-17 03:13:15,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-2000/global_step2000/mp_rank_00_model_states.pt. [2022-12-17 03:13:15,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-17 03:13:31,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-17 03:13:31,523] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-2000/global_step2000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-17 03:13:31,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! [2022-12-17 03:15:34,464] [INFO] [timer.py:197:stop] 0/4002, RunningAvgSamplesPerSec=6.32971650220364, CurrSamplesPerSec=5.432966773902989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:15:46,027] [INFO] [timer.py:197:stop] 0/4004, RunningAvgSamplesPerSec=6.329736699376116, CurrSamplesPerSec=5.732430946357427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:15:57,711] [INFO] [timer.py:197:stop] 0/4006, RunningAvgSamplesPerSec=6.329714692420776, CurrSamplesPerSec=5.7052121153177975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:16:09,000] [INFO] [timer.py:197:stop] 0/4008, RunningAvgSamplesPerSec=6.329729049348178, CurrSamplesPerSec=5.721881254416873, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:16:20,309] [INFO] [timer.py:197:stop] 0/4010, RunningAvgSamplesPerSec=6.32973210769902, CurrSamplesPerSec=5.6921966559580985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:16:31,950] [INFO] [timer.py:197:stop] 0/4012, RunningAvgSamplesPerSec=6.329631531896987, CurrSamplesPerSec=5.6903598999459915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:16:43,277] [INFO] [timer.py:197:stop] 0/4014, RunningAvgSamplesPerSec=6.3296284082486265, CurrSamplesPerSec=5.682677266268421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:16:54,617] [INFO] [timer.py:197:stop] 0/4016, RunningAvgSamplesPerSec=6.329620990535829, CurrSamplesPerSec=5.6637007057802835, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:17:06,320] [INFO] [timer.py:197:stop] 0/4018, RunningAvgSamplesPerSec=6.32961940678384, CurrSamplesPerSec=5.690537948945658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:17:17,689] [INFO] [logging.py:68:log_dist] [Rank 0] step=2010, skipped=5, lr=[6.657777777777779e-06], mom=[[0.9, 0.999]] [2022-12-17 03:17:17,691] [INFO] [timer.py:197:stop] 0/4020, RunningAvgSamplesPerSec=6.329603248560779, CurrSamplesPerSec=5.636197649705268, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:17:28,951] [INFO] [timer.py:197:stop] 0/4022, RunningAvgSamplesPerSec=6.329621839411704, CurrSamplesPerSec=5.729020022201075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:17:40,565] [INFO] [timer.py:197:stop] 0/4024, RunningAvgSamplesPerSec=6.329636464480563, CurrSamplesPerSec=5.72667583837786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:17:51,974] [INFO] [timer.py:197:stop] 0/4026, RunningAvgSamplesPerSec=6.329651877567379, CurrSamplesPerSec=5.730219255254423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:18:03,329] [INFO] [timer.py:197:stop] 0/4028, RunningAvgSamplesPerSec=6.3296407044135305, CurrSamplesPerSec=5.6457342331832026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:18:14,820] [INFO] [timer.py:197:stop] 0/4030, RunningAvgSamplesPerSec=6.329647226145717, CurrSamplesPerSec=5.706841778088452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:18:26,288] [INFO] [timer.py:197:stop] 0/4032, RunningAvgSamplesPerSec=6.329657786556081, CurrSamplesPerSec=5.715247919575367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:18:37,684] [INFO] [timer.py:197:stop] 0/4034, RunningAvgSamplesPerSec=6.329633943231989, CurrSamplesPerSec=5.610441031101313, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:18:49,059] [INFO] [timer.py:197:stop] 0/4036, RunningAvgSamplesPerSec=6.329643210614384, CurrSamplesPerSec=5.71373385290509, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:00,385] [INFO] [timer.py:197:stop] 0/4038, RunningAvgSamplesPerSec=6.329647008206662, CurrSamplesPerSec=5.697957140380308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:11,655] [INFO] [logging.py:68:log_dist] [Rank 0] step=2020, skipped=5, lr=[6.6355555555555565e-06], mom=[[0.9, 0.999]] [2022-12-17 03:19:11,656] [INFO] [timer.py:197:stop] 0/4040, RunningAvgSamplesPerSec=6.329656548870118, CurrSamplesPerSec=5.712982834924393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:22,964] [INFO] [timer.py:197:stop] 0/4042, RunningAvgSamplesPerSec=6.329664344944999, CurrSamplesPerSec=5.707049494758968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:34,251] [INFO] [timer.py:197:stop] 0/4044, RunningAvgSamplesPerSec=6.329677524006703, CurrSamplesPerSec=5.702061923775371, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:45,586] [INFO] [timer.py:197:stop] 0/4046, RunningAvgSamplesPerSec=6.329673899017344, CurrSamplesPerSec=5.676386285852038, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:19:56,869] [INFO] [timer.py:197:stop] 0/4048, RunningAvgSamplesPerSec=6.329685060089817, CurrSamplesPerSec=5.705445179125403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:20:08,153] [INFO] [timer.py:197:stop] 0/4050, RunningAvgSamplesPerSec=6.329690147179106, CurrSamplesPerSec=5.718421959793726, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0044, 'learning_rate': 6.6244444444444445e-06, 'epoch': 8.58} [2022-12-17 03:20:19,482] [INFO] [timer.py:197:stop] 0/4052, RunningAvgSamplesPerSec=6.329687546775836, CurrSamplesPerSec=5.677962528789538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:20:30,761] [INFO] [timer.py:197:stop] 0/4054, RunningAvgSamplesPerSec=6.329704134400702, CurrSamplesPerSec=5.731804735092358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:20:42,053] [INFO] [timer.py:197:stop] 0/4056, RunningAvgSamplesPerSec=6.3297167432347985, CurrSamplesPerSec=5.729271665429117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:20:53,666] [INFO] [timer.py:197:stop] 0/4058, RunningAvgSamplesPerSec=6.329700568900112, CurrSamplesPerSec=5.670981747554733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:21:04,942] [INFO] [logging.py:68:log_dist] [Rank 0] step=2030, skipped=5, lr=[6.613333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 03:21:04,944] [INFO] [timer.py:197:stop] 0/4060, RunningAvgSamplesPerSec=6.329714148054933, CurrSamplesPerSec=5.7095685344444, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:21:16,406] [INFO] [timer.py:197:stop] 0/4062, RunningAvgSamplesPerSec=6.32971542335254, CurrSamplesPerSec=5.707395803713119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:21:27,826] [INFO] [timer.py:197:stop] 0/4064, RunningAvgSamplesPerSec=6.329716546008745, CurrSamplesPerSec=5.7032450492428435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:21:39,082] [INFO] [timer.py:197:stop] 0/4066, RunningAvgSamplesPerSec=6.329727383544931, CurrSamplesPerSec=5.708138313313967, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:21:50,416] [INFO] [timer.py:197:stop] 0/4068, RunningAvgSamplesPerSec=6.329723384009874, CurrSamplesPerSec=5.685249002830177, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:01,855] [INFO] [timer.py:197:stop] 0/4070, RunningAvgSamplesPerSec=6.329735288298022, CurrSamplesPerSec=5.720577250197871, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:13,353] [INFO] [timer.py:197:stop] 0/4072, RunningAvgSamplesPerSec=6.329753146864072, CurrSamplesPerSec=5.725375509643616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:24,779] [INFO] [timer.py:197:stop] 0/4074, RunningAvgSamplesPerSec=6.329715954461023, CurrSamplesPerSec=5.568953772876529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:36,381] [INFO] [timer.py:197:stop] 0/4076, RunningAvgSamplesPerSec=6.329719132052436, CurrSamplesPerSec=5.680185980782685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:47,778] [INFO] [timer.py:197:stop] 0/4078, RunningAvgSamplesPerSec=6.3297134755114035, CurrSamplesPerSec=5.69351552665189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:22:59,261] [INFO] [logging.py:68:log_dist] [Rank 0] step=2040, skipped=5, lr=[6.591111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 03:22:59,263] [INFO] [timer.py:197:stop] 0/4080, RunningAvgSamplesPerSec=6.329662099764259, CurrSamplesPerSec=5.5199089195498106, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:23:10,789] [INFO] [timer.py:197:stop] 0/4082, RunningAvgSamplesPerSec=6.32966716446522, CurrSamplesPerSec=5.701408180754435, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:23:22,027] [INFO] [timer.py:197:stop] 0/4084, RunningAvgSamplesPerSec=6.329677835786168, CurrSamplesPerSec=5.70064635601823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:23:33,530] [INFO] [timer.py:197:stop] 0/4086, RunningAvgSamplesPerSec=6.329621485972424, CurrSamplesPerSec=5.497445663780493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:23:44,854] [INFO] [timer.py:197:stop] 0/4088, RunningAvgSamplesPerSec=6.3296199090930605, CurrSamplesPerSec=5.686864869676773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:23:56,148] [INFO] [timer.py:197:stop] 0/4090, RunningAvgSamplesPerSec=6.329627497694356, CurrSamplesPerSec=5.707074489723329, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:24:07,626] [INFO] [timer.py:197:stop] 0/4092, RunningAvgSamplesPerSec=6.329579498207154, CurrSamplesPerSec=5.5263138434487615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:24:18,910] [INFO] [timer.py:197:stop] 0/4094, RunningAvgSamplesPerSec=6.3295852834765025, CurrSamplesPerSec=5.718602012903586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:24:30,243] [INFO] [timer.py:197:stop] 0/4096, RunningAvgSamplesPerSec=6.329582388154523, CurrSamplesPerSec=5.686816197184262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:24:41,558] [INFO] [timer.py:197:stop] 0/4098, RunningAvgSamplesPerSec=6.329574192901616, CurrSamplesPerSec=5.650972720985214, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:24:52,865] [INFO] [logging.py:68:log_dist] [Rank 0] step=2050, skipped=5, lr=[6.568888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 03:24:52,867] [INFO] [timer.py:197:stop] 0/4100, RunningAvgSamplesPerSec=6.329577261261745, CurrSamplesPerSec=5.709207876322052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0038, 'learning_rate': 6.568888888888889e-06, 'epoch': 8.69} [2022-12-17 03:25:04,148] [INFO] [timer.py:197:stop] 0/4102, RunningAvgSamplesPerSec=6.329584710681415, CurrSamplesPerSec=5.7096530588978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:25:15,433] [INFO] [timer.py:197:stop] 0/4104, RunningAvgSamplesPerSec=6.329591737707764, CurrSamplesPerSec=5.7040087808033695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:25:26,720] [INFO] [timer.py:197:stop] 0/4106, RunningAvgSamplesPerSec=6.3296020891744265, CurrSamplesPerSec=5.718893922820299, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:25:38,007] [INFO] [timer.py:197:stop] 0/4108, RunningAvgSamplesPerSec=6.329611851038329, CurrSamplesPerSec=5.717244706560494, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:25:49,318] [INFO] [timer.py:197:stop] 0/4110, RunningAvgSamplesPerSec=6.32961398025573, CurrSamplesPerSec=5.707869589430014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:00,697] [INFO] [timer.py:197:stop] 0/4112, RunningAvgSamplesPerSec=6.329599969613452, CurrSamplesPerSec=5.651735367307539, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:11,995] [INFO] [timer.py:197:stop] 0/4114, RunningAvgSamplesPerSec=6.329607728066423, CurrSamplesPerSec=5.702019531298125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:23,266] [INFO] [timer.py:197:stop] 0/4116, RunningAvgSamplesPerSec=6.329624606775776, CurrSamplesPerSec=5.729385878152315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:34,582] [INFO] [timer.py:197:stop] 0/4118, RunningAvgSamplesPerSec=6.3296290795189885, CurrSamplesPerSec=5.688858265982743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:45,872] [INFO] [logging.py:68:log_dist] [Rank 0] step=2060, skipped=5, lr=[6.546666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 03:26:45,874] [INFO] [timer.py:197:stop] 0/4120, RunningAvgSamplesPerSec=6.3296391052996706, CurrSamplesPerSec=5.710607776037591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:26:57,215] [INFO] [timer.py:197:stop] 0/4122, RunningAvgSamplesPerSec=6.3296344831495865, CurrSamplesPerSec=5.668019787475643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:27:08,515] [INFO] [timer.py:197:stop] 0/4124, RunningAvgSamplesPerSec=6.3296411666454695, CurrSamplesPerSec=5.702394787187077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:27:19,807] [INFO] [timer.py:197:stop] 0/4126, RunningAvgSamplesPerSec=6.3296543447409706, CurrSamplesPerSec=5.716563129758426, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:27:31,092] [INFO] [timer.py:197:stop] 0/4128, RunningAvgSamplesPerSec=6.329665769004512, CurrSamplesPerSec=5.713112449253873, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:27:42,402] [INFO] [timer.py:197:stop] 0/4130, RunningAvgSamplesPerSec=6.329669537874054, CurrSamplesPerSec=5.685855206581097, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:27:53,659] [INFO] [timer.py:197:stop] 0/4132, RunningAvgSamplesPerSec=6.329684738201763, CurrSamplesPerSec=5.730757275477921, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:28:05,083] [INFO] [timer.py:197:stop] 0/4134, RunningAvgSamplesPerSec=6.329653695177852, CurrSamplesPerSec=5.704550376108452, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:28:16,391] [INFO] [timer.py:197:stop] 0/4136, RunningAvgSamplesPerSec=6.329658226333892, CurrSamplesPerSec=5.669268567918603, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:28:27,660] [INFO] [timer.py:197:stop] 0/4138, RunningAvgSamplesPerSec=6.329669431371385, CurrSamplesPerSec=5.7140525115073775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:28:39,033] [INFO] [logging.py:68:log_dist] [Rank 0] step=2070, skipped=5, lr=[6.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 03:28:39,035] [INFO] [timer.py:197:stop] 0/4140, RunningAvgSamplesPerSec=6.329653569925222, CurrSamplesPerSec=5.702104074634847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:28:50,314] [INFO] [timer.py:197:stop] 0/4142, RunningAvgSamplesPerSec=6.329667150732836, CurrSamplesPerSec=5.711580798448425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:29:01,628] [INFO] [timer.py:197:stop] 0/4144, RunningAvgSamplesPerSec=6.329672264319184, CurrSamplesPerSec=5.7046015347856525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:29:13,190] [INFO] [timer.py:197:stop] 0/4146, RunningAvgSamplesPerSec=6.329590255909673, CurrSamplesPerSec=5.693526153503661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:29:24,479] [INFO] [timer.py:197:stop] 0/4148, RunningAvgSamplesPerSec=6.329600253682247, CurrSamplesPerSec=5.712139877360979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:29:35,740] [INFO] [timer.py:197:stop] 0/4150, RunningAvgSamplesPerSec=6.329614214241194, CurrSamplesPerSec=5.721944677297023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0043, 'learning_rate': 6.513333333333333e-06, 'epoch': 8.79} [2022-12-17 03:29:47,451] [INFO] [timer.py:197:stop] 0/4152, RunningAvgSamplesPerSec=6.329625743832189, CurrSamplesPerSec=5.713240853488009, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:29:58,927] [INFO] [timer.py:197:stop] 0/4154, RunningAvgSamplesPerSec=6.32962401605577, CurrSamplesPerSec=5.681033239220668, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:30:10,298] [INFO] [timer.py:197:stop] 0/4156, RunningAvgSamplesPerSec=6.329609371145642, CurrSamplesPerSec=5.640112222567924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:30:21,742] [INFO] [timer.py:197:stop] 0/4158, RunningAvgSamplesPerSec=6.329619336010558, CurrSamplesPerSec=5.725745541206016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:30:33,113] [INFO] [logging.py:68:log_dist] [Rank 0] step=2080, skipped=5, lr=[6.502222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 03:30:33,114] [INFO] [timer.py:197:stop] 0/4160, RunningAvgSamplesPerSec=6.3296304931523135, CurrSamplesPerSec=5.719744481367389, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:30:44,377] [INFO] [timer.py:197:stop] 0/4162, RunningAvgSamplesPerSec=6.329639665828144, CurrSamplesPerSec=5.7231092248855, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:30:55,952] [INFO] [timer.py:197:stop] 0/4164, RunningAvgSamplesPerSec=6.329642890850235, CurrSamplesPerSec=5.7104019868553015, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:31:07,478] [INFO] [timer.py:197:stop] 0/4166, RunningAvgSamplesPerSec=6.329641465120764, CurrSamplesPerSec=5.693680488845232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:31:18,787] [INFO] [timer.py:197:stop] 0/4168, RunningAvgSamplesPerSec=6.329646013398303, CurrSamplesPerSec=5.698077365169859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:31:30,204] [INFO] [timer.py:197:stop] 0/4170, RunningAvgSamplesPerSec=6.32965932265026, CurrSamplesPerSec=5.731016412667152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:31:41,572] [INFO] [timer.py:197:stop] 0/4172, RunningAvgSamplesPerSec=6.329659992211202, CurrSamplesPerSec=5.703579747419184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:31:53,022] [INFO] [timer.py:197:stop] 0/4174, RunningAvgSamplesPerSec=6.329620320914269, CurrSamplesPerSec=5.552174278511241, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:32:04,605] [INFO] [timer.py:197:stop] 0/4176, RunningAvgSamplesPerSec=6.329618251779879, CurrSamplesPerSec=5.664145751377774, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:32:16,047] [INFO] [timer.py:197:stop] 0/4178, RunningAvgSamplesPerSec=6.329620119317804, CurrSamplesPerSec=5.691267149303224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:32:27,557] [INFO] [logging.py:68:log_dist] [Rank 0] step=2090, skipped=5, lr=[6.480000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 03:32:27,559] [INFO] [timer.py:197:stop] 0/4180, RunningAvgSamplesPerSec=6.329559581631733, CurrSamplesPerSec=5.49025277515664, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:32:38,847] [INFO] [timer.py:197:stop] 0/4182, RunningAvgSamplesPerSec=6.329566571122446, CurrSamplesPerSec=5.7119519660201625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:32:50,221] [INFO] [timer.py:197:stop] 0/4184, RunningAvgSamplesPerSec=6.329551512684384, CurrSamplesPerSec=5.658288559220713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:01,779] [INFO] [timer.py:197:stop] 0/4186, RunningAvgSamplesPerSec=6.329482129285263, CurrSamplesPerSec=5.455197789627966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:13,234] [INFO] [timer.py:197:stop] 0/4188, RunningAvgSamplesPerSec=6.329489056102356, CurrSamplesPerSec=5.689115074579025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:24,534] [INFO] [timer.py:197:stop] 0/4190, RunningAvgSamplesPerSec=6.3294930470222965, CurrSamplesPerSec=5.715814775678154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:36,200] [INFO] [timer.py:197:stop] 0/4192, RunningAvgSamplesPerSec=6.329385693595587, CurrSamplesPerSec=5.342715526710825, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:47,585] [INFO] [timer.py:197:stop] 0/4194, RunningAvgSamplesPerSec=6.329387920504537, CurrSamplesPerSec=5.708342239921322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:33:58,934] [INFO] [timer.py:197:stop] 0/4196, RunningAvgSamplesPerSec=6.329393759067435, CurrSamplesPerSec=5.698761557733716, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:34:10,299] [INFO] [timer.py:197:stop] 0/4198, RunningAvgSamplesPerSec=6.329377635096463, CurrSamplesPerSec=5.627035752573431, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:34:21,607] [INFO] [logging.py:68:log_dist] [Rank 0] step=2100, skipped=5, lr=[6.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 03:34:21,609] [INFO] [timer.py:197:stop] 0/4200, RunningAvgSamplesPerSec=6.329387313311945, CurrSamplesPerSec=5.706954612955244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0033, 'learning_rate': 6.457777777777778e-06, 'epoch': 8.9} [2022-12-17 03:34:32,898] [INFO] [timer.py:197:stop] 0/4202, RunningAvgSamplesPerSec=6.3293938964875185, CurrSamplesPerSec=5.699129125038513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:34:44,297] [INFO] [timer.py:197:stop] 0/4204, RunningAvgSamplesPerSec=6.329363594365425, CurrSamplesPerSec=5.581797360025495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:34:55,847] [INFO] [timer.py:197:stop] 0/4206, RunningAvgSamplesPerSec=6.329371810931073, CurrSamplesPerSec=5.705324158184767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:35:07,174] [INFO] [timer.py:197:stop] 0/4208, RunningAvgSamplesPerSec=6.3293724138440295, CurrSamplesPerSec=5.683446328781715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:35:18,805] [INFO] [timer.py:197:stop] 0/4210, RunningAvgSamplesPerSec=6.329274039267405, CurrSamplesPerSec=5.35519432319045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:35:30,104] [INFO] [timer.py:197:stop] 0/4212, RunningAvgSamplesPerSec=6.32927545611261, CurrSamplesPerSec=5.688271912984834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:35:41,411] [INFO] [timer.py:197:stop] 0/4214, RunningAvgSamplesPerSec=6.329279504683213, CurrSamplesPerSec=5.702676806686563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:35:52,661] [INFO] [timer.py:197:stop] 0/4216, RunningAvgSamplesPerSec=6.3292975459712455, CurrSamplesPerSec=5.735788331041529, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:36:03,980] [INFO] [timer.py:197:stop] 0/4218, RunningAvgSamplesPerSec=6.3292954506333325, CurrSamplesPerSec=5.720244698668586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:36:15,270] [INFO] [logging.py:68:log_dist] [Rank 0] step=2110, skipped=5, lr=[6.435555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 03:36:15,271] [INFO] [timer.py:197:stop] 0/4220, RunningAvgSamplesPerSec=6.3293015183733585, CurrSamplesPerSec=5.70264627745624, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:36:26,550] [INFO] [timer.py:197:stop] 0/4222, RunningAvgSamplesPerSec=6.329314799213386, CurrSamplesPerSec=5.726156173463574, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:36:37,964] [INFO] [timer.py:197:stop] 0/4224, RunningAvgSamplesPerSec=6.329297925836548, CurrSamplesPerSec=5.658801704212691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:36:49,489] [INFO] [timer.py:197:stop] 0/4226, RunningAvgSamplesPerSec=6.329300030042395, CurrSamplesPerSec=5.680489849179637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:00,832] [INFO] [timer.py:197:stop] 0/4228, RunningAvgSamplesPerSec=6.329290592139952, CurrSamplesPerSec=5.6715307493274505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:12,211] [INFO] [timer.py:197:stop] 0/4230, RunningAvgSamplesPerSec=6.329296799735275, CurrSamplesPerSec=5.710345622158048, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:23,534] [INFO] [timer.py:197:stop] 0/4232, RunningAvgSamplesPerSec=6.329293914072135, CurrSamplesPerSec=5.700618996107484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:35,089] [INFO] [timer.py:197:stop] 0/4234, RunningAvgSamplesPerSec=6.329223078174755, CurrSamplesPerSec=5.474385340765658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:46,417] [INFO] [timer.py:197:stop] 0/4236, RunningAvgSamplesPerSec=6.329221569757894, CurrSamplesPerSec=5.6885783348551735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:37:57,764] [INFO] [timer.py:197:stop] 0/4238, RunningAvgSamplesPerSec=6.3292032239593174, CurrSamplesPerSec=5.621259492527623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:38:09,097] [INFO] [logging.py:68:log_dist] [Rank 0] step=2120, skipped=5, lr=[6.4133333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 03:38:09,099] [INFO] [timer.py:197:stop] 0/4240, RunningAvgSamplesPerSec=6.3291967428627265, CurrSamplesPerSec=5.660347665602659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:38:20,450] [INFO] [timer.py:197:stop] 0/4242, RunningAvgSamplesPerSec=6.329187590411158, CurrSamplesPerSec=5.664508866697526, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:38:31,735] [INFO] [timer.py:197:stop] 0/4244, RunningAvgSamplesPerSec=6.329196166706487, CurrSamplesPerSec=5.720654542271054, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:38:43,043] [INFO] [timer.py:197:stop] 0/4246, RunningAvgSamplesPerSec=6.3291981518458655, CurrSamplesPerSec=5.692124476172127, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:38:51,551] [INFO] [timer.py:197:stop] 0/4248, RunningAvgSamplesPerSec=6.329926955740864, CurrSamplesPerSec=10.191885733525591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:39:02,808] [INFO] [timer.py:197:stop] 0/4250, RunningAvgSamplesPerSec=6.329945497555158, CurrSamplesPerSec=5.748404824921448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0034, 'learning_rate': 6.402222222222223e-06, 'epoch': 9.0} [2022-12-17 03:39:14,100] [INFO] [timer.py:197:stop] 0/4252, RunningAvgSamplesPerSec=6.329951721289756, CurrSamplesPerSec=5.724151693184711, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:39:25,414] [INFO] [timer.py:197:stop] 0/4254, RunningAvgSamplesPerSec=6.329950908156978, CurrSamplesPerSec=5.700597931576377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:39:36,694] [INFO] [timer.py:197:stop] 0/4256, RunningAvgSamplesPerSec=6.329955788741976, CurrSamplesPerSec=5.695662231242793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:39:47,972] [INFO] [timer.py:197:stop] 0/4258, RunningAvgSamplesPerSec=6.329966695173077, CurrSamplesPerSec=5.712562662301858, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:39:59,270] [INFO] [logging.py:68:log_dist] [Rank 0] step=2130, skipped=5, lr=[6.391111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 03:39:59,272] [INFO] [timer.py:197:stop] 0/4260, RunningAvgSamplesPerSec=6.3299661333167165, CurrSamplesPerSec=5.71034149203002, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:40:10,580] [INFO] [timer.py:197:stop] 0/4262, RunningAvgSamplesPerSec=6.329958580709758, CurrSamplesPerSec=5.673393738119202, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:40:21,866] [INFO] [timer.py:197:stop] 0/4264, RunningAvgSamplesPerSec=6.3299615564115275, CurrSamplesPerSec=5.704911658444872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:40:33,141] [INFO] [timer.py:197:stop] 0/4266, RunningAvgSamplesPerSec=6.329968199591096, CurrSamplesPerSec=5.7203561137522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:40:44,456] [INFO] [timer.py:197:stop] 0/4268, RunningAvgSamplesPerSec=6.329969998219967, CurrSamplesPerSec=5.713600561987379, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:40:55,751] [INFO] [timer.py:197:stop] 0/4270, RunningAvgSamplesPerSec=6.329975276219748, CurrSamplesPerSec=5.6975735193423205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:41:07,069] [INFO] [timer.py:197:stop] 0/4272, RunningAvgSamplesPerSec=6.329973921749979, CurrSamplesPerSec=5.702552995722206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:41:18,377] [INFO] [timer.py:197:stop] 0/4274, RunningAvgSamplesPerSec=6.329976146136992, CurrSamplesPerSec=5.694227613441014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:41:29,677] [INFO] [timer.py:197:stop] 0/4276, RunningAvgSamplesPerSec=6.329976503267094, CurrSamplesPerSec=5.705686993857206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:41:40,961] [INFO] [timer.py:197:stop] 0/4278, RunningAvgSamplesPerSec=6.329981382624964, CurrSamplesPerSec=5.711891438350277, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:41:52,457] [INFO] [logging.py:68:log_dist] [Rank 0] step=2140, skipped=5, lr=[6.368888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 03:41:52,459] [INFO] [timer.py:197:stop] 0/4280, RunningAvgSamplesPerSec=6.329982693290168, CurrSamplesPerSec=5.707232958038073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:42:03,725] [INFO] [timer.py:197:stop] 0/4282, RunningAvgSamplesPerSec=6.329991865303273, CurrSamplesPerSec=5.713054571959741, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:42:15,042] [INFO] [timer.py:197:stop] 0/4284, RunningAvgSamplesPerSec=6.329990865925492, CurrSamplesPerSec=5.676679663925739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:42:26,344] [INFO] [timer.py:197:stop] 0/4286, RunningAvgSamplesPerSec=6.3299970498352485, CurrSamplesPerSec=5.709284132894068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:42:37,618] [INFO] [timer.py:197:stop] 0/4288, RunningAvgSamplesPerSec=6.330009159304688, CurrSamplesPerSec=5.725098322614197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:42:48,894] [INFO] [timer.py:197:stop] 0/4290, RunningAvgSamplesPerSec=6.330021472725454, CurrSamplesPerSec=5.7156183469429696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:43:00,189] [INFO] [timer.py:197:stop] 0/4292, RunningAvgSamplesPerSec=6.330023543445085, CurrSamplesPerSec=5.6900711368439065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:43:11,467] [INFO] [timer.py:197:stop] 0/4294, RunningAvgSamplesPerSec=6.33003465710397, CurrSamplesPerSec=5.722863979052479, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:43:22,753] [INFO] [timer.py:197:stop] 0/4296, RunningAvgSamplesPerSec=6.330044033654043, CurrSamplesPerSec=5.711008949653978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:43:34,052] [INFO] [timer.py:197:stop] 0/4298, RunningAvgSamplesPerSec=6.330051736381823, CurrSamplesPerSec=5.7098993599376096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:43:45,358] [INFO] [logging.py:68:log_dist] [Rank 0] step=2150, skipped=5, lr=[6.346666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 03:43:45,360] [INFO] [timer.py:197:stop] 0/4300, RunningAvgSamplesPerSec=6.330053968045217, CurrSamplesPerSec=5.711852059591176, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0027, 'learning_rate': 6.346666666666668e-06, 'epoch': 9.11} [2022-12-17 03:43:56,648] [INFO] [timer.py:197:stop] 0/4302, RunningAvgSamplesPerSec=6.330057923831548, CurrSamplesPerSec=5.699157196643189, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:44:07,884] [INFO] [timer.py:197:stop] 0/4304, RunningAvgSamplesPerSec=6.3300738850474225, CurrSamplesPerSec=5.737495354931139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:44:19,393] [INFO] [timer.py:197:stop] 0/4306, RunningAvgSamplesPerSec=6.330077660065506, CurrSamplesPerSec=5.6889624334353215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:44:30,687] [INFO] [timer.py:197:stop] 0/4308, RunningAvgSamplesPerSec=6.330085346146435, CurrSamplesPerSec=5.708143896824679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:44:41,955] [INFO] [timer.py:197:stop] 0/4310, RunningAvgSamplesPerSec=6.330095058863617, CurrSamplesPerSec=5.716104697029398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:44:53,244] [INFO] [timer.py:197:stop] 0/4312, RunningAvgSamplesPerSec=6.330103442896721, CurrSamplesPerSec=5.709839361572828, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:45:04,538] [INFO] [timer.py:197:stop] 0/4314, RunningAvgSamplesPerSec=6.3301104318862835, CurrSamplesPerSec=5.700382210684036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:45:15,848] [INFO] [timer.py:197:stop] 0/4316, RunningAvgSamplesPerSec=6.33011144671632, CurrSamplesPerSec=5.70177754288311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:45:27,176] [INFO] [timer.py:197:stop] 0/4318, RunningAvgSamplesPerSec=6.330108280879267, CurrSamplesPerSec=5.688100273302122, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:45:38,475] [INFO] [logging.py:68:log_dist] [Rank 0] step=2160, skipped=5, lr=[6.324444444444446e-06], mom=[[0.9, 0.999]] [2022-12-17 03:45:38,477] [INFO] [timer.py:197:stop] 0/4320, RunningAvgSamplesPerSec=6.330114475329232, CurrSamplesPerSec=5.704752106643522, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:45:49,797] [INFO] [timer.py:197:stop] 0/4322, RunningAvgSamplesPerSec=6.330116375057936, CurrSamplesPerSec=5.718255560844172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:01,100] [INFO] [timer.py:197:stop] 0/4324, RunningAvgSamplesPerSec=6.330120682549647, CurrSamplesPerSec=5.712504066748646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:12,375] [INFO] [timer.py:197:stop] 0/4326, RunningAvgSamplesPerSec=6.330133929062189, CurrSamplesPerSec=5.72703455258396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:23,714] [INFO] [timer.py:197:stop] 0/4328, RunningAvgSamplesPerSec=6.330129803058916, CurrSamplesPerSec=5.6888122116816495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:34,999] [INFO] [timer.py:197:stop] 0/4330, RunningAvgSamplesPerSec=6.330134832486639, CurrSamplesPerSec=5.709437380764239, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:46,276] [INFO] [timer.py:197:stop] 0/4332, RunningAvgSamplesPerSec=6.330142065442036, CurrSamplesPerSec=5.700766452184192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:46:57,581] [INFO] [timer.py:197:stop] 0/4334, RunningAvgSamplesPerSec=6.330142658914399, CurrSamplesPerSec=5.686657414993794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:47:08,853] [INFO] [timer.py:197:stop] 0/4336, RunningAvgSamplesPerSec=6.330150998651473, CurrSamplesPerSec=5.711459517079455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:47:20,162] [INFO] [timer.py:197:stop] 0/4338, RunningAvgSamplesPerSec=6.330149550688539, CurrSamplesPerSec=5.666942151202926, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:47:31,422] [INFO] [logging.py:68:log_dist] [Rank 0] step=2170, skipped=5, lr=[6.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 03:47:31,424] [INFO] [timer.py:197:stop] 0/4340, RunningAvgSamplesPerSec=6.330167309081633, CurrSamplesPerSec=5.715221149418475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:47:42,711] [INFO] [timer.py:197:stop] 0/4342, RunningAvgSamplesPerSec=6.33017199685608, CurrSamplesPerSec=5.686261100566463, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:47:54,166] [INFO] [timer.py:197:stop] 0/4344, RunningAvgSamplesPerSec=6.330183846704809, CurrSamplesPerSec=5.696465033629745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:48:05,427] [INFO] [timer.py:197:stop] 0/4346, RunningAvgSamplesPerSec=6.330196429462367, CurrSamplesPerSec=5.7171146611417045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:48:16,706] [INFO] [timer.py:197:stop] 0/4348, RunningAvgSamplesPerSec=6.330206849105902, CurrSamplesPerSec=5.714444680646246, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:48:28,017] [INFO] [timer.py:197:stop] 0/4350, RunningAvgSamplesPerSec=6.330208170318338, CurrSamplesPerSec=5.692302394200352, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0032, 'learning_rate': 6.291111111111111e-06, 'epoch': 9.22} [2022-12-17 03:48:39,257] [INFO] [timer.py:197:stop] 0/4352, RunningAvgSamplesPerSec=6.330230027558183, CurrSamplesPerSec=5.74510397307503, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:48:50,553] [INFO] [timer.py:197:stop] 0/4354, RunningAvgSamplesPerSec=6.330236067545069, CurrSamplesPerSec=5.6960760535362125, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:01,820] [INFO] [timer.py:197:stop] 0/4356, RunningAvgSamplesPerSec=6.330245950766376, CurrSamplesPerSec=5.709252318598857, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:13,074] [INFO] [timer.py:197:stop] 0/4358, RunningAvgSamplesPerSec=6.3302603683327785, CurrSamplesPerSec=5.723507031478542, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:24,355] [INFO] [logging.py:68:log_dist] [Rank 0] step=2180, skipped=5, lr=[6.280000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 03:49:24,357] [INFO] [timer.py:197:stop] 0/4360, RunningAvgSamplesPerSec=6.330268998724679, CurrSamplesPerSec=5.716786164512912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:35,681] [INFO] [timer.py:197:stop] 0/4362, RunningAvgSamplesPerSec=6.330269874076075, CurrSamplesPerSec=5.709341448091649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:46,987] [INFO] [timer.py:197:stop] 0/4364, RunningAvgSamplesPerSec=6.330273298176687, CurrSamplesPerSec=5.7026029071275754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:49:58,302] [INFO] [timer.py:197:stop] 0/4366, RunningAvgSamplesPerSec=6.330274186013026, CurrSamplesPerSec=5.711801013859715, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:50:09,593] [INFO] [timer.py:197:stop] 0/4368, RunningAvgSamplesPerSec=6.330281236381866, CurrSamplesPerSec=5.714302111692187, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:50:20,878] [INFO] [timer.py:197:stop] 0/4370, RunningAvgSamplesPerSec=6.330293064442466, CurrSamplesPerSec=5.723522652006832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:50:32,180] [INFO] [timer.py:197:stop] 0/4372, RunningAvgSamplesPerSec=6.330291145422422, CurrSamplesPerSec=5.684595016149614, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:50:43,488] [INFO] [timer.py:197:stop] 0/4374, RunningAvgSamplesPerSec=6.330295297779702, CurrSamplesPerSec=5.705242186997218, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:50:54,780] [INFO] [timer.py:197:stop] 0/4376, RunningAvgSamplesPerSec=6.330302984893042, CurrSamplesPerSec=5.711761879416802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:51:06,066] [INFO] [timer.py:197:stop] 0/4378, RunningAvgSamplesPerSec=6.3303119676765185, CurrSamplesPerSec=5.730619763654686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:51:17,354] [INFO] [logging.py:68:log_dist] [Rank 0] step=2190, skipped=5, lr=[6.2577777777777785e-06], mom=[[0.9, 0.999]] [2022-12-17 03:51:17,355] [INFO] [timer.py:197:stop] 0/4380, RunningAvgSamplesPerSec=6.330315521160135, CurrSamplesPerSec=5.717432966308704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:51:28,622] [INFO] [timer.py:197:stop] 0/4382, RunningAvgSamplesPerSec=6.330325204840329, CurrSamplesPerSec=5.717960792921026, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:51:39,891] [INFO] [timer.py:197:stop] 0/4384, RunningAvgSamplesPerSec=6.330339635750931, CurrSamplesPerSec=5.732306574514285, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:51:51,192] [INFO] [timer.py:197:stop] 0/4386, RunningAvgSamplesPerSec=6.3303441565671585, CurrSamplesPerSec=5.7166807321598165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:02,549] [INFO] [timer.py:197:stop] 0/4388, RunningAvgSamplesPerSec=6.330334339923144, CurrSamplesPerSec=5.64699055579327, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:13,829] [INFO] [timer.py:197:stop] 0/4390, RunningAvgSamplesPerSec=6.330346790628633, CurrSamplesPerSec=5.7156687307214815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:25,123] [INFO] [timer.py:197:stop] 0/4392, RunningAvgSamplesPerSec=6.33035745595913, CurrSamplesPerSec=5.719217787639403, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:36,425] [INFO] [timer.py:197:stop] 0/4394, RunningAvgSamplesPerSec=6.330364386815062, CurrSamplesPerSec=5.710929245018288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:47,718] [INFO] [timer.py:197:stop] 0/4396, RunningAvgSamplesPerSec=6.330374101398396, CurrSamplesPerSec=5.722247418954987, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:52:58,994] [INFO] [timer.py:197:stop] 0/4398, RunningAvgSamplesPerSec=6.330389488345945, CurrSamplesPerSec=5.73566454879658, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:53:10,356] [INFO] [logging.py:68:log_dist] [Rank 0] step=2200, skipped=5, lr=[6.235555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 03:53:10,358] [INFO] [timer.py:197:stop] 0/4400, RunningAvgSamplesPerSec=6.330381002027457, CurrSamplesPerSec=5.683259337924224, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0028, 'learning_rate': 6.235555555555556e-06, 'epoch': 9.32} [2022-12-17 03:53:21,684] [INFO] [timer.py:197:stop] 0/4402, RunningAvgSamplesPerSec=6.330380721761244, CurrSamplesPerSec=5.681286215532228, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:53:32,959] [INFO] [timer.py:197:stop] 0/4404, RunningAvgSamplesPerSec=6.33039288991234, CurrSamplesPerSec=5.704690276657935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:53:44,300] [INFO] [timer.py:197:stop] 0/4406, RunningAvgSamplesPerSec=6.3303819555293055, CurrSamplesPerSec=5.672073625116918, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:53:55,668] [INFO] [timer.py:197:stop] 0/4408, RunningAvgSamplesPerSec=6.3303853532268946, CurrSamplesPerSec=5.713735312329119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:54:06,948] [INFO] [timer.py:197:stop] 0/4410, RunningAvgSamplesPerSec=6.3303942961100095, CurrSamplesPerSec=5.709733942398946, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:54:18,343] [INFO] [timer.py:197:stop] 0/4412, RunningAvgSamplesPerSec=6.330372825528667, CurrSamplesPerSec=5.6119057705684305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:54:29,681] [INFO] [timer.py:197:stop] 0/4414, RunningAvgSamplesPerSec=6.3303670020010205, CurrSamplesPerSec=5.667097441483019, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:54:41,064] [INFO] [timer.py:197:stop] 0/4416, RunningAvgSamplesPerSec=6.330358021308729, CurrSamplesPerSec=5.664311884604425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:54:52,354] [INFO] [timer.py:197:stop] 0/4418, RunningAvgSamplesPerSec=6.330364103786946, CurrSamplesPerSec=5.71063547488767, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:55:03,657] [INFO] [logging.py:68:log_dist] [Rank 0] step=2210, skipped=5, lr=[6.213333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 03:55:03,659] [INFO] [timer.py:197:stop] 0/4420, RunningAvgSamplesPerSec=6.330366350568987, CurrSamplesPerSec=5.696660631635473, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:55:15,004] [INFO] [timer.py:197:stop] 0/4422, RunningAvgSamplesPerSec=6.330356961663918, CurrSamplesPerSec=5.6580722121687375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:55:26,283] [INFO] [timer.py:197:stop] 0/4424, RunningAvgSamplesPerSec=6.330366423816739, CurrSamplesPerSec=5.7198256510353795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:55:37,536] [INFO] [timer.py:197:stop] 0/4426, RunningAvgSamplesPerSec=6.330382510418656, CurrSamplesPerSec=5.738779356959358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:55:48,852] [INFO] [timer.py:197:stop] 0/4428, RunningAvgSamplesPerSec=6.330381786900965, CurrSamplesPerSec=5.713057246934883, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:00,144] [INFO] [timer.py:197:stop] 0/4430, RunningAvgSamplesPerSec=6.330387543094171, CurrSamplesPerSec=5.70596473014713, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:11,431] [INFO] [timer.py:197:stop] 0/4432, RunningAvgSamplesPerSec=6.3303949265506265, CurrSamplesPerSec=5.709284618611247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:22,718] [INFO] [timer.py:197:stop] 0/4434, RunningAvgSamplesPerSec=6.330398974040821, CurrSamplesPerSec=5.728504821701217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:34,005] [INFO] [timer.py:197:stop] 0/4436, RunningAvgSamplesPerSec=6.3304073938842675, CurrSamplesPerSec=5.695226960885154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:45,500] [INFO] [timer.py:197:stop] 0/4438, RunningAvgSamplesPerSec=6.330411289428148, CurrSamplesPerSec=5.701911251660377, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:56:56,822] [INFO] [logging.py:68:log_dist] [Rank 0] step=2220, skipped=5, lr=[6.191111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 03:56:56,823] [INFO] [timer.py:197:stop] 0/4440, RunningAvgSamplesPerSec=6.33041506366355, CurrSamplesPerSec=5.713683989698677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:57:08,107] [INFO] [timer.py:197:stop] 0/4442, RunningAvgSamplesPerSec=6.3304343171901, CurrSamplesPerSec=5.7385732499032756, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:57:19,373] [INFO] [timer.py:197:stop] 0/4444, RunningAvgSamplesPerSec=6.330443822660566, CurrSamplesPerSec=5.722055426774207, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:57:30,636] [INFO] [timer.py:197:stop] 0/4446, RunningAvgSamplesPerSec=6.3304531891264775, CurrSamplesPerSec=5.723357908539943, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:57:41,949] [INFO] [timer.py:197:stop] 0/4448, RunningAvgSamplesPerSec=6.3304547739768315, CurrSamplesPerSec=5.717142910205345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:57:53,236] [INFO] [timer.py:197:stop] 0/4450, RunningAvgSamplesPerSec=6.33046078844013, CurrSamplesPerSec=5.701764705232637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0027, 'learning_rate': 6.18e-06, 'epoch': 9.43} [2022-12-17 03:58:04,529] [INFO] [timer.py:197:stop] 0/4452, RunningAvgSamplesPerSec=6.330467659165346, CurrSamplesPerSec=5.713794419628813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:58:15,810] [INFO] [timer.py:197:stop] 0/4454, RunningAvgSamplesPerSec=6.330472250504414, CurrSamplesPerSec=5.717310218572185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:58:27,136] [INFO] [timer.py:197:stop] 0/4456, RunningAvgSamplesPerSec=6.330470611831681, CurrSamplesPerSec=5.699990516842288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:58:38,446] [INFO] [timer.py:197:stop] 0/4458, RunningAvgSamplesPerSec=6.330472097937645, CurrSamplesPerSec=5.6944752430173935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:58:49,765] [INFO] [logging.py:68:log_dist] [Rank 0] step=2230, skipped=5, lr=[6.16888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 03:58:49,766] [INFO] [timer.py:197:stop] 0/4460, RunningAvgSamplesPerSec=6.330472808261691, CurrSamplesPerSec=5.685759101297652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:01,199] [INFO] [timer.py:197:stop] 0/4462, RunningAvgSamplesPerSec=6.330476629555121, CurrSamplesPerSec=5.714341767530068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:12,484] [INFO] [timer.py:197:stop] 0/4464, RunningAvgSamplesPerSec=6.330480789170642, CurrSamplesPerSec=5.678016814778323, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:23,796] [INFO] [timer.py:197:stop] 0/4466, RunningAvgSamplesPerSec=6.330481948473897, CurrSamplesPerSec=5.708953135855733, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:35,079] [INFO] [timer.py:197:stop] 0/4468, RunningAvgSamplesPerSec=6.33048619652435, CurrSamplesPerSec=5.698196869284114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:46,403] [INFO] [timer.py:197:stop] 0/4470, RunningAvgSamplesPerSec=6.330483432308348, CurrSamplesPerSec=5.68910446418724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 03:59:57,679] [INFO] [timer.py:197:stop] 0/4472, RunningAvgSamplesPerSec=6.330493710341828, CurrSamplesPerSec=5.72305016874372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:00:08,961] [INFO] [timer.py:197:stop] 0/4474, RunningAvgSamplesPerSec=6.330494027807299, CurrSamplesPerSec=5.6877512402808765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:00:20,267] [INFO] [timer.py:197:stop] 0/4476, RunningAvgSamplesPerSec=6.3304963388366415, CurrSamplesPerSec=5.690306825164165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:00:31,592] [INFO] [timer.py:197:stop] 0/4478, RunningAvgSamplesPerSec=6.330488847791333, CurrSamplesPerSec=5.673577202367607, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:00:42,914] [INFO] [logging.py:68:log_dist] [Rank 0] step=2240, skipped=5, lr=[6.146666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 04:00:42,915] [INFO] [timer.py:197:stop] 0/4480, RunningAvgSamplesPerSec=6.330486086094699, CurrSamplesPerSec=5.688008913144727, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:00:54,182] [INFO] [timer.py:197:stop] 0/4482, RunningAvgSamplesPerSec=6.330499265528688, CurrSamplesPerSec=5.71943688629642, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:01:05,462] [INFO] [timer.py:197:stop] 0/4484, RunningAvgSamplesPerSec=6.330509348678946, CurrSamplesPerSec=5.71629215134095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:01:16,762] [INFO] [timer.py:197:stop] 0/4486, RunningAvgSamplesPerSec=6.330513088250893, CurrSamplesPerSec=5.699628889360823, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:01:28,058] [INFO] [timer.py:197:stop] 0/4488, RunningAvgSamplesPerSec=6.330518727226009, CurrSamplesPerSec=5.707317898735957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:01:39,342] [INFO] [timer.py:197:stop] 0/4490, RunningAvgSamplesPerSec=6.330526565520092, CurrSamplesPerSec=5.702028978709898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:01:50,602] [INFO] [timer.py:197:stop] 0/4492, RunningAvgSamplesPerSec=6.33054246376338, CurrSamplesPerSec=5.729671796777712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:02:01,875] [INFO] [timer.py:197:stop] 0/4494, RunningAvgSamplesPerSec=6.330555609378127, CurrSamplesPerSec=5.725686430644263, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:02:13,375] [INFO] [timer.py:197:stop] 0/4496, RunningAvgSamplesPerSec=6.330563896608057, CurrSamplesPerSec=5.706466179325695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:02:24,676] [INFO] [timer.py:197:stop] 0/4498, RunningAvgSamplesPerSec=6.330571840505921, CurrSamplesPerSec=5.7039484213581675, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:02:35,974] [INFO] [logging.py:68:log_dist] [Rank 0] step=2250, skipped=5, lr=[6.124444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 04:02:35,976] [INFO] [timer.py:197:stop] 0/4500, RunningAvgSamplesPerSec=6.33057808605072, CurrSamplesPerSec=5.700222912132732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0029, 'learning_rate': 6.124444444444445e-06, 'epoch': 9.53} [2022-12-17 04:02:47,309] [INFO] [timer.py:197:stop] 0/4502, RunningAvgSamplesPerSec=6.330590813929422, CurrSamplesPerSec=5.704254595111555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:02:58,592] [INFO] [timer.py:197:stop] 0/4504, RunningAvgSamplesPerSec=6.330599689229134, CurrSamplesPerSec=5.697111839696128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:03:09,876] [INFO] [timer.py:197:stop] 0/4506, RunningAvgSamplesPerSec=6.330611905513456, CurrSamplesPerSec=5.717914753402765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:03:21,200] [INFO] [timer.py:197:stop] 0/4508, RunningAvgSamplesPerSec=6.330605019172982, CurrSamplesPerSec=5.673138107069762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:03:32,518] [INFO] [timer.py:197:stop] 0/4510, RunningAvgSamplesPerSec=6.330605752558324, CurrSamplesPerSec=5.695185878279826, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:03:43,803] [INFO] [timer.py:197:stop] 0/4512, RunningAvgSamplesPerSec=6.330615522766701, CurrSamplesPerSec=5.706228665318071, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:03:55,121] [INFO] [timer.py:197:stop] 0/4514, RunningAvgSamplesPerSec=6.3306133846847645, CurrSamplesPerSec=5.685259358026691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:04:06,451] [INFO] [timer.py:197:stop] 0/4516, RunningAvgSamplesPerSec=6.330608909026556, CurrSamplesPerSec=5.700777106120688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:04:17,795] [INFO] [timer.py:197:stop] 0/4518, RunningAvgSamplesPerSec=6.33060024301429, CurrSamplesPerSec=5.657230118958954, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:04:29,104] [INFO] [logging.py:68:log_dist] [Rank 0] step=2260, skipped=5, lr=[6.102222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 04:04:29,106] [INFO] [timer.py:197:stop] 0/4520, RunningAvgSamplesPerSec=6.330601376732257, CurrSamplesPerSec=5.713666476974156, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:04:40,371] [INFO] [timer.py:197:stop] 0/4522, RunningAvgSamplesPerSec=6.330606514763284, CurrSamplesPerSec=5.709888914787506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:04:51,644] [INFO] [timer.py:197:stop] 0/4524, RunningAvgSamplesPerSec=6.330609315895327, CurrSamplesPerSec=5.686145950865563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:02,950] [INFO] [timer.py:197:stop] 0/4526, RunningAvgSamplesPerSec=6.33061151512052, CurrSamplesPerSec=5.720672585516275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:14,252] [INFO] [timer.py:197:stop] 0/4528, RunningAvgSamplesPerSec=6.330615066568691, CurrSamplesPerSec=5.714858803511764, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:25,506] [INFO] [timer.py:197:stop] 0/4530, RunningAvgSamplesPerSec=6.330622922685323, CurrSamplesPerSec=5.71164496542443, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:36,782] [INFO] [timer.py:197:stop] 0/4532, RunningAvgSamplesPerSec=6.330625751387737, CurrSamplesPerSec=5.687373812170686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:48,077] [INFO] [timer.py:197:stop] 0/4534, RunningAvgSamplesPerSec=6.3306346663352, CurrSamplesPerSec=5.712731891063314, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:05:59,367] [INFO] [timer.py:197:stop] 0/4536, RunningAvgSamplesPerSec=6.330640941125124, CurrSamplesPerSec=5.700456536914458, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:06:10,640] [INFO] [timer.py:197:stop] 0/4538, RunningAvgSamplesPerSec=6.330652404193294, CurrSamplesPerSec=5.715381530744073, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:06:21,900] [INFO] [logging.py:68:log_dist] [Rank 0] step=2270, skipped=5, lr=[6.08e-06], mom=[[0.9, 0.999]] [2022-12-17 04:06:21,902] [INFO] [timer.py:197:stop] 0/4540, RunningAvgSamplesPerSec=6.330666901885576, CurrSamplesPerSec=5.737028654425584, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:06:33,224] [INFO] [timer.py:197:stop] 0/4542, RunningAvgSamplesPerSec=6.330665209915535, CurrSamplesPerSec=5.682952526304001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:06:44,491] [INFO] [timer.py:197:stop] 0/4544, RunningAvgSamplesPerSec=6.330674842163236, CurrSamplesPerSec=5.707736329246788, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:06:55,804] [INFO] [timer.py:197:stop] 0/4546, RunningAvgSamplesPerSec=6.330674269839647, CurrSamplesPerSec=5.701357321521387, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:07:07,141] [INFO] [timer.py:197:stop] 0/4548, RunningAvgSamplesPerSec=6.330686947984364, CurrSamplesPerSec=5.721433919111742, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:07:18,412] [INFO] [timer.py:197:stop] 0/4550, RunningAvgSamplesPerSec=6.330698464805578, CurrSamplesPerSec=5.721675871507792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0036, 'learning_rate': 6.06888888888889e-06, 'epoch': 9.64} [2022-12-17 04:07:29,707] [INFO] [timer.py:197:stop] 0/4552, RunningAvgSamplesPerSec=6.330710496101829, CurrSamplesPerSec=5.720422916124233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:07:41,068] [INFO] [timer.py:197:stop] 0/4554, RunningAvgSamplesPerSec=6.3307208209518695, CurrSamplesPerSec=5.725109311897351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:07:52,440] [INFO] [timer.py:197:stop] 0/4556, RunningAvgSamplesPerSec=6.330723336313207, CurrSamplesPerSec=5.692474529011012, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:08:03,875] [INFO] [timer.py:197:stop] 0/4558, RunningAvgSamplesPerSec=6.330714611360339, CurrSamplesPerSec=5.664903350183139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:08:15,227] [INFO] [logging.py:68:log_dist] [Rank 0] step=2280, skipped=5, lr=[6.057777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 04:08:15,228] [INFO] [timer.py:197:stop] 0/4560, RunningAvgSamplesPerSec=6.330726531708481, CurrSamplesPerSec=5.729612115939908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:08:26,573] [INFO] [timer.py:197:stop] 0/4562, RunningAvgSamplesPerSec=6.330743349268332, CurrSamplesPerSec=5.720252987606335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:08:37,916] [INFO] [timer.py:197:stop] 0/4564, RunningAvgSamplesPerSec=6.33075457470811, CurrSamplesPerSec=5.7212497855880375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:08:49,272] [INFO] [timer.py:197:stop] 0/4566, RunningAvgSamplesPerSec=6.330758780378043, CurrSamplesPerSec=5.6899461841189565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:00,622] [INFO] [timer.py:197:stop] 0/4568, RunningAvgSamplesPerSec=6.330764615495463, CurrSamplesPerSec=5.712111191518389, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:11,925] [INFO] [timer.py:197:stop] 0/4570, RunningAvgSamplesPerSec=6.330760727916034, CurrSamplesPerSec=5.6708976453672735, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:23,181] [INFO] [timer.py:197:stop] 0/4572, RunningAvgSamplesPerSec=6.330776742335515, CurrSamplesPerSec=5.7173092444048805, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:34,546] [INFO] [timer.py:197:stop] 0/4574, RunningAvgSamplesPerSec=6.33078097114606, CurrSamplesPerSec=5.702969516215391, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:45,942] [INFO] [timer.py:197:stop] 0/4576, RunningAvgSamplesPerSec=6.330784055681368, CurrSamplesPerSec=5.708197790402674, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:09:57,327] [INFO] [timer.py:197:stop] 0/4578, RunningAvgSamplesPerSec=6.330790606004829, CurrSamplesPerSec=5.698967234714678, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:10:08,671] [INFO] [logging.py:68:log_dist] [Rank 0] step=2290, skipped=5, lr=[6.0355555555555555e-06], mom=[[0.9, 0.999]] [2022-12-17 04:10:08,672] [INFO] [timer.py:197:stop] 0/4580, RunningAvgSamplesPerSec=6.33079115086289, CurrSamplesPerSec=5.6993892819688305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:10:20,063] [INFO] [timer.py:197:stop] 0/4582, RunningAvgSamplesPerSec=6.330792257325494, CurrSamplesPerSec=5.690597301088274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:10:31,383] [INFO] [timer.py:197:stop] 0/4584, RunningAvgSamplesPerSec=6.33079201294549, CurrSamplesPerSec=5.686121861559865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:10:42,919] [INFO] [timer.py:197:stop] 0/4586, RunningAvgSamplesPerSec=6.33079594497458, CurrSamplesPerSec=5.688110397827378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:10:54,250] [INFO] [timer.py:197:stop] 0/4588, RunningAvgSamplesPerSec=6.330806829957438, CurrSamplesPerSec=5.736196727782429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:11:05,530] [INFO] [timer.py:197:stop] 0/4590, RunningAvgSamplesPerSec=6.330812007815817, CurrSamplesPerSec=5.71894777590225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:11:16,800] [INFO] [timer.py:197:stop] 0/4592, RunningAvgSamplesPerSec=6.330825381671413, CurrSamplesPerSec=5.726484770543781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:11:28,344] [INFO] [timer.py:197:stop] 0/4594, RunningAvgSamplesPerSec=6.3308329510418035, CurrSamplesPerSec=5.723417459157915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:11:39,647] [INFO] [timer.py:197:stop] 0/4596, RunningAvgSamplesPerSec=6.330836708797295, CurrSamplesPerSec=5.699751121093182, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:11:51,143] [INFO] [timer.py:197:stop] 0/4598, RunningAvgSamplesPerSec=6.330848828375051, CurrSamplesPerSec=5.73858281883173, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:12:02,491] [INFO] [logging.py:68:log_dist] [Rank 0] step=2300, skipped=5, lr=[6.013333333333335e-06], mom=[[0.9, 0.999]] [2022-12-17 04:12:02,493] [INFO] [timer.py:197:stop] 0/4600, RunningAvgSamplesPerSec=6.330861170154268, CurrSamplesPerSec=5.724425614623796, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0032, 'learning_rate': 6.013333333333335e-06, 'epoch': 9.75} [2022-12-17 04:12:13,898] [INFO] [timer.py:197:stop] 0/4602, RunningAvgSamplesPerSec=6.330832410767395, CurrSamplesPerSec=5.5755094488871615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:12:25,351] [INFO] [timer.py:197:stop] 0/4604, RunningAvgSamplesPerSec=6.3308431891690775, CurrSamplesPerSec=5.727538489991829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:12:36,684] [INFO] [timer.py:197:stop] 0/4606, RunningAvgSamplesPerSec=6.330837902683982, CurrSamplesPerSec=5.665191716507016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:12:48,317] [INFO] [timer.py:197:stop] 0/4608, RunningAvgSamplesPerSec=6.330755379814556, CurrSamplesPerSec=5.394187377355717, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:12:59,623] [INFO] [timer.py:197:stop] 0/4610, RunningAvgSamplesPerSec=6.330756819466149, CurrSamplesPerSec=5.6961722661655605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:13:10,941] [INFO] [timer.py:197:stop] 0/4612, RunningAvgSamplesPerSec=6.330756151419777, CurrSamplesPerSec=5.695062150719643, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:13:22,525] [INFO] [timer.py:197:stop] 0/4614, RunningAvgSamplesPerSec=6.330679185520829, CurrSamplesPerSec=5.406924178710288, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:13:33,814] [INFO] [timer.py:197:stop] 0/4616, RunningAvgSamplesPerSec=6.330686433323466, CurrSamplesPerSec=5.6993048191151345, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:13:45,112] [INFO] [timer.py:197:stop] 0/4618, RunningAvgSamplesPerSec=6.330687694288799, CurrSamplesPerSec=5.700686306802098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:13:56,440] [INFO] [logging.py:68:log_dist] [Rank 0] step=2310, skipped=5, lr=[5.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 04:13:56,442] [INFO] [timer.py:197:stop] 0/4620, RunningAvgSamplesPerSec=6.3306887015343865, CurrSamplesPerSec=5.662078148631941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:14:07,726] [INFO] [timer.py:197:stop] 0/4622, RunningAvgSamplesPerSec=6.3306965751068045, CurrSamplesPerSec=5.7048692235839935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:14:19,039] [INFO] [timer.py:197:stop] 0/4624, RunningAvgSamplesPerSec=6.33070326498081, CurrSamplesPerSec=5.682439803730237, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:14:30,421] [INFO] [timer.py:197:stop] 0/4626, RunningAvgSamplesPerSec=6.330692695475473, CurrSamplesPerSec=5.638133414390749, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:14:41,715] [INFO] [timer.py:197:stop] 0/4628, RunningAvgSamplesPerSec=6.330699691584441, CurrSamplesPerSec=5.717189667889637, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:14:53,008] [INFO] [timer.py:197:stop] 0/4630, RunningAvgSamplesPerSec=6.330705696490953, CurrSamplesPerSec=5.695336920186152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:15:04,352] [INFO] [timer.py:197:stop] 0/4632, RunningAvgSamplesPerSec=6.330698305517842, CurrSamplesPerSec=5.6486805486610585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:15:15,647] [INFO] [timer.py:197:stop] 0/4634, RunningAvgSamplesPerSec=6.330704188121081, CurrSamplesPerSec=5.720073561851484, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:15:27,219] [INFO] [timer.py:197:stop] 0/4636, RunningAvgSamplesPerSec=6.330706098111885, CurrSamplesPerSec=5.6777554829107535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:15:38,794] [INFO] [timer.py:197:stop] 0/4638, RunningAvgSamplesPerSec=6.3307191696421805, CurrSamplesPerSec=5.740637442241396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:15:50,098] [INFO] [logging.py:68:log_dist] [Rank 0] step=2320, skipped=5, lr=[5.96888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 04:15:50,100] [INFO] [timer.py:197:stop] 0/4640, RunningAvgSamplesPerSec=6.330714300595713, CurrSamplesPerSec=5.690648692358206, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:16:01,486] [INFO] [timer.py:197:stop] 0/4642, RunningAvgSamplesPerSec=6.3306947193985605, CurrSamplesPerSec=5.607131295528686, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:16:12,790] [INFO] [timer.py:197:stop] 0/4644, RunningAvgSamplesPerSec=6.3306961430292, CurrSamplesPerSec=5.696737762353815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:16:24,098] [INFO] [timer.py:197:stop] 0/4646, RunningAvgSamplesPerSec=6.330697606070938, CurrSamplesPerSec=5.699602265322378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:16:35,405] [INFO] [timer.py:197:stop] 0/4648, RunningAvgSamplesPerSec=6.330698876455062, CurrSamplesPerSec=5.714086811994762, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:16:46,764] [INFO] [timer.py:197:stop] 0/4650, RunningAvgSamplesPerSec=6.330690469186335, CurrSamplesPerSec=5.701085118955992, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0039, 'learning_rate': 5.957777777777778e-06, 'epoch': 9.85} [2022-12-17 04:16:58,107] [INFO] [timer.py:197:stop] 0/4652, RunningAvgSamplesPerSec=6.3306828500294925, CurrSamplesPerSec=5.6933512985319545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:17:09,667] [INFO] [timer.py:197:stop] 0/4654, RunningAvgSamplesPerSec=6.330616710020938, CurrSamplesPerSec=5.467337482556833, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:17:21,103] [INFO] [timer.py:197:stop] 0/4656, RunningAvgSamplesPerSec=6.330615721908283, CurrSamplesPerSec=5.708993203154572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:17:32,573] [INFO] [timer.py:197:stop] 0/4658, RunningAvgSamplesPerSec=6.33060873986457, CurrSamplesPerSec=5.676477032949712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:17:44,017] [INFO] [logging.py:68:log_dist] [Rank 0] step=2330, skipped=5, lr=[5.946666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 04:17:44,034] [INFO] [timer.py:197:stop] 0/4660, RunningAvgSamplesPerSec=6.330570240212617, CurrSamplesPerSec=5.560654459755998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:17:55,547] [INFO] [timer.py:197:stop] 0/4662, RunningAvgSamplesPerSec=6.33057605313111, CurrSamplesPerSec=5.711914774167443, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:18:07,012] [INFO] [timer.py:197:stop] 0/4664, RunningAvgSamplesPerSec=6.330578144373511, CurrSamplesPerSec=5.7113858757955525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:18:18,541] [INFO] [timer.py:197:stop] 0/4666, RunningAvgSamplesPerSec=6.330524070412179, CurrSamplesPerSec=5.487751174558908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:18:30,062] [INFO] [timer.py:197:stop] 0/4668, RunningAvgSamplesPerSec=6.33053034689637, CurrSamplesPerSec=5.705891958199446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:18:41,533] [INFO] [timer.py:197:stop] 0/4670, RunningAvgSamplesPerSec=6.330536314630159, CurrSamplesPerSec=5.701401641659355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:18:52,850] [INFO] [timer.py:197:stop] 0/4672, RunningAvgSamplesPerSec=6.330536460115227, CurrSamplesPerSec=5.671820509831005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:19:04,157] [INFO] [timer.py:197:stop] 0/4674, RunningAvgSamplesPerSec=6.330554419779344, CurrSamplesPerSec=5.738497926375647, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:19:15,446] [INFO] [timer.py:197:stop] 0/4676, RunningAvgSamplesPerSec=6.330560749126159, CurrSamplesPerSec=5.716416559625785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:19:26,970] [INFO] [timer.py:197:stop] 0/4678, RunningAvgSamplesPerSec=6.330504478971126, CurrSamplesPerSec=5.483726107472392, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:19:38,276] [INFO] [logging.py:68:log_dist] [Rank 0] step=2340, skipped=5, lr=[5.924444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 04:19:38,277] [INFO] [timer.py:197:stop] 0/4680, RunningAvgSamplesPerSec=6.3305070302567445, CurrSamplesPerSec=5.6899080722178645, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:19:49,582] [INFO] [timer.py:197:stop] 0/4682, RunningAvgSamplesPerSec=6.330510689869193, CurrSamplesPerSec=5.6991613106084875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:01,233] [INFO] [timer.py:197:stop] 0/4684, RunningAvgSamplesPerSec=6.330422201677251, CurrSamplesPerSec=5.354158658429851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:12,510] [INFO] [timer.py:197:stop] 0/4686, RunningAvgSamplesPerSec=6.330433264181468, CurrSamplesPerSec=5.7141668480667915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:23,801] [INFO] [timer.py:197:stop] 0/4688, RunningAvgSamplesPerSec=6.330442375056253, CurrSamplesPerSec=5.719161005075831, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:35,398] [INFO] [timer.py:197:stop] 0/4690, RunningAvgSamplesPerSec=6.330370208120584, CurrSamplesPerSec=5.415934115753085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:46,670] [INFO] [timer.py:197:stop] 0/4692, RunningAvgSamplesPerSec=6.330381828816718, CurrSamplesPerSec=5.713629262790996, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:20:58,159] [INFO] [timer.py:197:stop] 0/4694, RunningAvgSamplesPerSec=6.330391647125279, CurrSamplesPerSec=5.715702320400646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:21:09,552] [INFO] [timer.py:197:stop] 0/4696, RunningAvgSamplesPerSec=6.330390667727303, CurrSamplesPerSec=5.727730361440755, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:21:20,834] [INFO] [timer.py:197:stop] 0/4698, RunningAvgSamplesPerSec=6.33039563285138, CurrSamplesPerSec=5.694713229294793, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:21:32,212] [INFO] [logging.py:68:log_dist] [Rank 0] step=2350, skipped=5, lr=[5.902222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 04:21:32,214] [INFO] [timer.py:197:stop] 0/4700, RunningAvgSamplesPerSec=6.330408088819912, CurrSamplesPerSec=5.717672875415328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0035, 'learning_rate': 5.902222222222223e-06, 'epoch': 9.96} [2022-12-17 04:21:43,841] [INFO] [timer.py:197:stop] 0/4702, RunningAvgSamplesPerSec=6.330403336990846, CurrSamplesPerSec=5.721540014669421, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:21:55,098] [INFO] [timer.py:197:stop] 0/4704, RunningAvgSamplesPerSec=6.330418358984936, CurrSamplesPerSec=5.722686341466305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:22:06,354] [INFO] [timer.py:197:stop] 0/4706, RunningAvgSamplesPerSec=6.330429863484377, CurrSamplesPerSec=5.722186428937103, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:22:17,857] [INFO] [timer.py:197:stop] 0/4708, RunningAvgSamplesPerSec=6.3303796798625305, CurrSamplesPerSec=5.741505533618278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:22:29,135] [INFO] [timer.py:197:stop] 0/4710, RunningAvgSamplesPerSec=6.330390373733256, CurrSamplesPerSec=5.729053768988964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:22:40,498] [INFO] [timer.py:197:stop] 0/4712, RunningAvgSamplesPerSec=6.330378569329705, CurrSamplesPerSec=5.644133107749217, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:22:51,988] [INFO] [timer.py:197:stop] 0/4714, RunningAvgSamplesPerSec=6.330381143253887, CurrSamplesPerSec=5.696042210670469, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:03,456] [INFO] [timer.py:197:stop] 0/4716, RunningAvgSamplesPerSec=6.330393422869406, CurrSamplesPerSec=5.716793469438937, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:15,099] [INFO] [timer.py:197:stop] 0/4718, RunningAvgSamplesPerSec=6.33030756039602, CurrSamplesPerSec=5.366892319611965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:23,834] [INFO] [logging.py:68:log_dist] [Rank 0] step=2360, skipped=5, lr=[5.8800000000000005e-06], mom=[[0.9, 0.999]] [2022-12-17 04:23:23,835] [INFO] [timer.py:197:stop] 0/4720, RunningAvgSamplesPerSec=6.330972527543288, CurrSamplesPerSec=10.234348433383115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:35,092] [INFO] [timer.py:197:stop] 0/4722, RunningAvgSamplesPerSec=6.330985658534921, CurrSamplesPerSec=5.727015247337471, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:46,448] [INFO] [timer.py:197:stop] 0/4724, RunningAvgSamplesPerSec=6.330978732945515, CurrSamplesPerSec=5.661985472808679, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:23:58,056] [INFO] [timer.py:197:stop] 0/4726, RunningAvgSamplesPerSec=6.3309908988236305, CurrSamplesPerSec=5.7250834260977195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:24:09,589] [INFO] [timer.py:197:stop] 0/4728, RunningAvgSamplesPerSec=6.331000839362033, CurrSamplesPerSec=5.721300756463225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:24:20,996] [INFO] [timer.py:197:stop] 0/4730, RunningAvgSamplesPerSec=6.3309785423666876, CurrSamplesPerSec=5.590156459974123, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:24:32,356] [INFO] [timer.py:197:stop] 0/4732, RunningAvgSamplesPerSec=6.330984497430182, CurrSamplesPerSec=5.707438276169677, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:24:43,838] [INFO] [timer.py:197:stop] 0/4734, RunningAvgSamplesPerSec=6.33099808507498, CurrSamplesPerSec=5.725624878794464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:24:55,147] [INFO] [timer.py:197:stop] 0/4736, RunningAvgSamplesPerSec=6.330995778328434, CurrSamplesPerSec=5.663898840429798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:25:06,588] [INFO] [timer.py:197:stop] 0/4738, RunningAvgSamplesPerSec=6.331004069060532, CurrSamplesPerSec=5.698754540780964, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:25:18,033] [INFO] [logging.py:68:log_dist] [Rank 0] step=2370, skipped=5, lr=[5.857777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 04:25:18,035] [INFO] [timer.py:197:stop] 0/4740, RunningAvgSamplesPerSec=6.331010722188499, CurrSamplesPerSec=5.697760243937208, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:25:29,349] [INFO] [timer.py:197:stop] 0/4742, RunningAvgSamplesPerSec=6.331010941877606, CurrSamplesPerSec=5.6695437279093355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:25:40,631] [INFO] [timer.py:197:stop] 0/4744, RunningAvgSamplesPerSec=6.331019301537191, CurrSamplesPerSec=5.703297153907961, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:25:51,912] [INFO] [timer.py:197:stop] 0/4746, RunningAvgSamplesPerSec=6.331027374547529, CurrSamplesPerSec=5.724606778497856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:26:03,303] [INFO] [timer.py:197:stop] 0/4748, RunningAvgSamplesPerSec=6.331006687700225, CurrSamplesPerSec=5.595459055112661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:26:14,592] [INFO] [timer.py:197:stop] 0/4750, RunningAvgSamplesPerSec=6.331013643470817, CurrSamplesPerSec=5.714340551085141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.003, 'learning_rate': 5.846666666666667e-06, 'epoch': 10.06} [2022-12-17 04:26:25,876] [INFO] [timer.py:197:stop] 0/4752, RunningAvgSamplesPerSec=6.3310215455665695, CurrSamplesPerSec=5.689917238324662, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:26:37,436] [INFO] [timer.py:197:stop] 0/4754, RunningAvgSamplesPerSec=6.330957040473491, CurrSamplesPerSec=5.434912660706087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:26:48,726] [INFO] [timer.py:197:stop] 0/4756, RunningAvgSamplesPerSec=6.330964057867343, CurrSamplesPerSec=5.710236783132878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:00,107] [INFO] [timer.py:197:stop] 0/4758, RunningAvgSamplesPerSec=6.3309677032167375, CurrSamplesPerSec=5.715751001795795, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:11,573] [INFO] [logging.py:68:log_dist] [Rank 0] step=2380, skipped=5, lr=[5.8355555555555565e-06], mom=[[0.9, 0.999]] [2022-12-17 04:27:11,575] [INFO] [timer.py:197:stop] 0/4760, RunningAvgSamplesPerSec=6.330973467975903, CurrSamplesPerSec=5.680981781017513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:22,902] [INFO] [timer.py:197:stop] 0/4762, RunningAvgSamplesPerSec=6.3309707590843685, CurrSamplesPerSec=5.6827352515239316, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:34,292] [INFO] [timer.py:197:stop] 0/4764, RunningAvgSamplesPerSec=6.330987778779006, CurrSamplesPerSec=5.753313279983759, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:45,702] [INFO] [timer.py:197:stop] 0/4766, RunningAvgSamplesPerSec=6.33099289520879, CurrSamplesPerSec=5.73214793479881, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:27:57,013] [INFO] [timer.py:197:stop] 0/4768, RunningAvgSamplesPerSec=6.33099395228108, CurrSamplesPerSec=5.686713794908089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:28:08,481] [INFO] [timer.py:197:stop] 0/4770, RunningAvgSamplesPerSec=6.3310024447366215, CurrSamplesPerSec=5.73548121382306, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:28:19,983] [INFO] [timer.py:197:stop] 0/4772, RunningAvgSamplesPerSec=6.330988706634111, CurrSamplesPerSec=5.693727588150975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:28:31,256] [INFO] [timer.py:197:stop] 0/4774, RunningAvgSamplesPerSec=6.330996263719724, CurrSamplesPerSec=5.714685798939968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:28:42,516] [INFO] [timer.py:197:stop] 0/4776, RunningAvgSamplesPerSec=6.331011640842076, CurrSamplesPerSec=5.7397435947875985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:28:53,854] [INFO] [timer.py:197:stop] 0/4778, RunningAvgSamplesPerSec=6.331007745299911, CurrSamplesPerSec=5.724216387060134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:29:05,156] [INFO] [logging.py:68:log_dist] [Rank 0] step=2390, skipped=5, lr=[5.813333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 04:29:05,158] [INFO] [timer.py:197:stop] 0/4780, RunningAvgSamplesPerSec=6.331010742488278, CurrSamplesPerSec=5.699064996979088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:29:16,468] [INFO] [timer.py:197:stop] 0/4782, RunningAvgSamplesPerSec=6.331012239137666, CurrSamplesPerSec=5.716506643368661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:29:27,729] [INFO] [timer.py:197:stop] 0/4784, RunningAvgSamplesPerSec=6.331018580974164, CurrSamplesPerSec=5.714587743334454, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:29:39,013] [INFO] [timer.py:197:stop] 0/4786, RunningAvgSamplesPerSec=6.331022981960946, CurrSamplesPerSec=5.702331554684467, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:29:50,326] [INFO] [timer.py:197:stop] 0/4788, RunningAvgSamplesPerSec=6.331019901591877, CurrSamplesPerSec=5.690069448256989, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:01,604] [INFO] [timer.py:197:stop] 0/4790, RunningAvgSamplesPerSec=6.331029794130788, CurrSamplesPerSec=5.716243947629426, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:12,897] [INFO] [timer.py:197:stop] 0/4792, RunningAvgSamplesPerSec=6.331036251624487, CurrSamplesPerSec=5.719949238107081, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:24,158] [INFO] [timer.py:197:stop] 0/4794, RunningAvgSamplesPerSec=6.331043100155323, CurrSamplesPerSec=5.702531432327052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:35,431] [INFO] [timer.py:197:stop] 0/4796, RunningAvgSamplesPerSec=6.331050018922631, CurrSamplesPerSec=5.6975415935642415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:46,826] [INFO] [timer.py:197:stop] 0/4798, RunningAvgSamplesPerSec=6.33102938541137, CurrSamplesPerSec=5.688286859627258, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:30:58,154] [INFO] [logging.py:68:log_dist] [Rank 0] step=2400, skipped=5, lr=[5.791111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 04:30:58,156] [INFO] [timer.py:197:stop] 0/4800, RunningAvgSamplesPerSec=6.331025671102678, CurrSamplesPerSec=5.684242320700877, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0021, 'learning_rate': 5.791111111111112e-06, 'epoch': 10.17} [2022-12-17 04:31:09,453] [INFO] [timer.py:197:stop] 0/4802, RunningAvgSamplesPerSec=6.331027120213348, CurrSamplesPerSec=5.682597147509798, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:31:20,797] [INFO] [timer.py:197:stop] 0/4804, RunningAvgSamplesPerSec=6.331016437845667, CurrSamplesPerSec=5.691636164035205, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:31:32,093] [INFO] [timer.py:197:stop] 0/4806, RunningAvgSamplesPerSec=6.331014511201373, CurrSamplesPerSec=5.683533932334821, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:31:43,425] [INFO] [timer.py:197:stop] 0/4808, RunningAvgSamplesPerSec=6.331017018483959, CurrSamplesPerSec=5.686735720732231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:31:55,131] [INFO] [timer.py:197:stop] 0/4810, RunningAvgSamplesPerSec=6.331027561745868, CurrSamplesPerSec=5.721124191374545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:32:06,778] [INFO] [timer.py:197:stop] 0/4812, RunningAvgSamplesPerSec=6.331032378578249, CurrSamplesPerSec=5.701915854066887, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:32:18,288] [INFO] [timer.py:197:stop] 0/4814, RunningAvgSamplesPerSec=6.330999850169187, CurrSamplesPerSec=5.556238848627807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:32:29,583] [INFO] [timer.py:197:stop] 0/4816, RunningAvgSamplesPerSec=6.331008162515726, CurrSamplesPerSec=5.728340769234183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:32:40,915] [INFO] [timer.py:197:stop] 0/4818, RunningAvgSamplesPerSec=6.331002794289343, CurrSamplesPerSec=5.681108264065701, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:32:52,634] [INFO] [logging.py:68:log_dist] [Rank 0] step=2410, skipped=5, lr=[5.768888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 04:32:52,636] [INFO] [timer.py:197:stop] 0/4820, RunningAvgSamplesPerSec=6.330896323352138, CurrSamplesPerSec=5.290471094035281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:33:03,977] [INFO] [timer.py:197:stop] 0/4822, RunningAvgSamplesPerSec=6.330893064737957, CurrSamplesPerSec=5.677873415545301, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:33:15,309] [INFO] [timer.py:197:stop] 0/4824, RunningAvgSamplesPerSec=6.330895551737563, CurrSamplesPerSec=5.6946078846918695, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:33:26,713] [INFO] [timer.py:197:stop] 0/4826, RunningAvgSamplesPerSec=6.330872884371823, CurrSamplesPerSec=5.635442266683619, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:33:37,998] [INFO] [timer.py:197:stop] 0/4828, RunningAvgSamplesPerSec=6.33088129819278, CurrSamplesPerSec=5.694139921496582, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:33:49,282] [INFO] [timer.py:197:stop] 0/4830, RunningAvgSamplesPerSec=6.330889321460371, CurrSamplesPerSec=5.706906323939215, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:00,782] [INFO] [timer.py:197:stop] 0/4832, RunningAvgSamplesPerSec=6.330877685241006, CurrSamplesPerSec=5.637208456846233, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:12,038] [INFO] [timer.py:197:stop] 0/4834, RunningAvgSamplesPerSec=6.330888486726185, CurrSamplesPerSec=5.713096642313446, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:23,435] [INFO] [timer.py:197:stop] 0/4836, RunningAvgSamplesPerSec=6.330895314332208, CurrSamplesPerSec=5.717684566931951, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:34,736] [INFO] [timer.py:197:stop] 0/4838, RunningAvgSamplesPerSec=6.330900525193966, CurrSamplesPerSec=5.68793563429523, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:46,005] [INFO] [logging.py:68:log_dist] [Rank 0] step=2420, skipped=5, lr=[5.746666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 04:34:46,007] [INFO] [timer.py:197:stop] 0/4840, RunningAvgSamplesPerSec=6.33091151618933, CurrSamplesPerSec=5.7051672509014955, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:34:57,541] [INFO] [timer.py:197:stop] 0/4842, RunningAvgSamplesPerSec=6.330914801125688, CurrSamplesPerSec=5.700826744304333, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:35:09,022] [INFO] [timer.py:197:stop] 0/4844, RunningAvgSamplesPerSec=6.3309054184735265, CurrSamplesPerSec=5.671777607111656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:35:20,280] [INFO] [timer.py:197:stop] 0/4846, RunningAvgSamplesPerSec=6.330915487625754, CurrSamplesPerSec=5.706281552323623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:35:31,871] [INFO] [timer.py:197:stop] 0/4848, RunningAvgSamplesPerSec=6.330914274265737, CurrSamplesPerSec=5.6780453994108475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:35:43,422] [INFO] [timer.py:197:stop] 0/4850, RunningAvgSamplesPerSec=6.33090621324211, CurrSamplesPerSec=5.731395739761191, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0021, 'learning_rate': 5.735555555555557e-06, 'epoch': 10.28} [2022-12-17 04:35:54,655] [INFO] [timer.py:197:stop] 0/4852, RunningAvgSamplesPerSec=6.33092348176634, CurrSamplesPerSec=5.737195166807862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:36:05,945] [INFO] [timer.py:197:stop] 0/4854, RunningAvgSamplesPerSec=6.33093041408174, CurrSamplesPerSec=5.719320876462623, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:36:17,387] [INFO] [timer.py:197:stop] 0/4856, RunningAvgSamplesPerSec=6.330920262213994, CurrSamplesPerSec=5.719814194516181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:36:28,701] [INFO] [timer.py:197:stop] 0/4858, RunningAvgSamplesPerSec=6.330922055789028, CurrSamplesPerSec=5.686120416208014, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:36:40,001] [INFO] [logging.py:68:log_dist] [Rank 0] step=2430, skipped=5, lr=[5.724444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 04:36:40,004] [INFO] [timer.py:197:stop] 0/4860, RunningAvgSamplesPerSec=6.330924720708107, CurrSamplesPerSec=5.714667550123298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:36:51,672] [INFO] [timer.py:197:stop] 0/4862, RunningAvgSamplesPerSec=6.3308836746394785, CurrSamplesPerSec=5.716470122590163, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:02,940] [INFO] [timer.py:197:stop] 0/4864, RunningAvgSamplesPerSec=6.330895535122308, CurrSamplesPerSec=5.733101619045565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:14,232] [INFO] [timer.py:197:stop] 0/4866, RunningAvgSamplesPerSec=6.330901820379269, CurrSamplesPerSec=5.705878131739231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:25,508] [INFO] [timer.py:197:stop] 0/4868, RunningAvgSamplesPerSec=6.33090794129048, CurrSamplesPerSec=5.7053280385360585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:36,804] [INFO] [timer.py:197:stop] 0/4870, RunningAvgSamplesPerSec=6.330909647225291, CurrSamplesPerSec=5.696581568772256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:48,184] [INFO] [timer.py:197:stop] 0/4872, RunningAvgSamplesPerSec=6.330889102496775, CurrSamplesPerSec=5.614864792503346, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:37:59,850] [INFO] [timer.py:197:stop] 0/4874, RunningAvgSamplesPerSec=6.330887659503165, CurrSamplesPerSec=5.7011781105036325, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:38:11,423] [INFO] [timer.py:197:stop] 0/4876, RunningAvgSamplesPerSec=6.330893858924424, CurrSamplesPerSec=5.698189611806278, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:38:22,772] [INFO] [timer.py:197:stop] 0/4878, RunningAvgSamplesPerSec=6.330885528518763, CurrSamplesPerSec=5.673804330756011, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:38:34,301] [INFO] [logging.py:68:log_dist] [Rank 0] step=2440, skipped=5, lr=[5.702222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 04:38:34,303] [INFO] [timer.py:197:stop] 0/4880, RunningAvgSamplesPerSec=6.330893383464845, CurrSamplesPerSec=5.717082028950172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:38:45,586] [INFO] [timer.py:197:stop] 0/4882, RunningAvgSamplesPerSec=6.330902580256411, CurrSamplesPerSec=5.728562523461276, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:38:56,843] [INFO] [timer.py:197:stop] 0/4884, RunningAvgSamplesPerSec=6.330914329730515, CurrSamplesPerSec=5.71846191647385, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:39:08,130] [INFO] [timer.py:197:stop] 0/4886, RunningAvgSamplesPerSec=6.330922027749416, CurrSamplesPerSec=5.739567116761201, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:39:19,403] [INFO] [timer.py:197:stop] 0/4888, RunningAvgSamplesPerSec=6.330933121823192, CurrSamplesPerSec=5.723764781098753, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:39:31,079] [INFO] [timer.py:197:stop] 0/4890, RunningAvgSamplesPerSec=6.330841332485585, CurrSamplesPerSec=5.349498429200928, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:39:42,308] [INFO] [timer.py:197:stop] 0/4892, RunningAvgSamplesPerSec=6.330855947690373, CurrSamplesPerSec=5.728922696735184, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:39:53,592] [INFO] [timer.py:197:stop] 0/4894, RunningAvgSamplesPerSec=6.33086065662833, CurrSamplesPerSec=5.6968968661698085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:40:04,953] [INFO] [timer.py:197:stop] 0/4896, RunningAvgSamplesPerSec=6.330849295325178, CurrSamplesPerSec=5.6243742767222225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:40:16,256] [INFO] [timer.py:197:stop] 0/4898, RunningAvgSamplesPerSec=6.33085324011296, CurrSamplesPerSec=5.696648542370528, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:40:27,669] [INFO] [logging.py:68:log_dist] [Rank 0] step=2450, skipped=5, lr=[5.68e-06], mom=[[0.9, 0.999]] [2022-12-17 04:40:27,671] [INFO] [timer.py:197:stop] 0/4900, RunningAvgSamplesPerSec=6.330869663344087, CurrSamplesPerSec=5.736259242619751, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.002, 'learning_rate': 5.68e-06, 'epoch': 10.38} [2022-12-17 04:40:39,234] [INFO] [timer.py:197:stop] 0/4902, RunningAvgSamplesPerSec=6.330875795922531, CurrSamplesPerSec=5.697687681112232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:40:50,517] [INFO] [timer.py:197:stop] 0/4904, RunningAvgSamplesPerSec=6.330884282678793, CurrSamplesPerSec=5.71153437551238, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:01,993] [INFO] [timer.py:197:stop] 0/4906, RunningAvgSamplesPerSec=6.330895576355606, CurrSamplesPerSec=5.721800514246041, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:13,570] [INFO] [timer.py:197:stop] 0/4908, RunningAvgSamplesPerSec=6.330898229565119, CurrSamplesPerSec=5.7448197088231945, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:24,882] [INFO] [timer.py:197:stop] 0/4910, RunningAvgSamplesPerSec=6.330898886215009, CurrSamplesPerSec=5.6970586388875954, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:36,127] [INFO] [timer.py:197:stop] 0/4912, RunningAvgSamplesPerSec=6.330912386969782, CurrSamplesPerSec=5.699116541305495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:47,493] [INFO] [timer.py:197:stop] 0/4914, RunningAvgSamplesPerSec=6.330903721488429, CurrSamplesPerSec=5.698654369615493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:41:58,764] [INFO] [timer.py:197:stop] 0/4916, RunningAvgSamplesPerSec=6.330914261240133, CurrSamplesPerSec=5.732406952943924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:42:09,996] [INFO] [timer.py:197:stop] 0/4918, RunningAvgSamplesPerSec=6.330934678578837, CurrSamplesPerSec=5.731016657378124, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:42:21,272] [INFO] [logging.py:68:log_dist] [Rank 0] step=2460, skipped=5, lr=[5.657777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 04:42:21,274] [INFO] [timer.py:197:stop] 0/4920, RunningAvgSamplesPerSec=6.330939690491742, CurrSamplesPerSec=5.706582638769841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:42:32,531] [INFO] [timer.py:197:stop] 0/4922, RunningAvgSamplesPerSec=6.330953956718717, CurrSamplesPerSec=5.741506024833396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:42:43,762] [INFO] [timer.py:197:stop] 0/4924, RunningAvgSamplesPerSec=6.330971782272438, CurrSamplesPerSec=5.735810391804533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:42:55,146] [INFO] [timer.py:197:stop] 0/4926, RunningAvgSamplesPerSec=6.330967983484556, CurrSamplesPerSec=5.715938432909474, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:43:06,393] [INFO] [timer.py:197:stop] 0/4928, RunningAvgSamplesPerSec=6.330985542332555, CurrSamplesPerSec=5.745078397976903, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:43:17,660] [INFO] [timer.py:197:stop] 0/4930, RunningAvgSamplesPerSec=6.330994799172411, CurrSamplesPerSec=5.723397202046335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:43:28,909] [INFO] [timer.py:197:stop] 0/4932, RunningAvgSamplesPerSec=6.3310084101233475, CurrSamplesPerSec=5.7396984310883274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:43:40,183] [INFO] [timer.py:197:stop] 0/4934, RunningAvgSamplesPerSec=6.331015231943894, CurrSamplesPerSec=5.7254964057845985, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:43:51,455] [INFO] [timer.py:197:stop] 0/4936, RunningAvgSamplesPerSec=6.3310223319318535, CurrSamplesPerSec=5.731285852101456, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:02,808] [INFO] [timer.py:197:stop] 0/4938, RunningAvgSamplesPerSec=6.331024615497289, CurrSamplesPerSec=5.694806496347404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:14,054] [INFO] [logging.py:68:log_dist] [Rank 0] step=2470, skipped=5, lr=[5.635555555555557e-06], mom=[[0.9, 0.999]] [2022-12-17 04:44:14,056] [INFO] [timer.py:197:stop] 0/4940, RunningAvgSamplesPerSec=6.331037883835673, CurrSamplesPerSec=5.726190863716782, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:25,339] [INFO] [timer.py:197:stop] 0/4942, RunningAvgSamplesPerSec=6.331046275843268, CurrSamplesPerSec=5.72533814277841, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:36,837] [INFO] [timer.py:197:stop] 0/4944, RunningAvgSamplesPerSec=6.330999783404063, CurrSamplesPerSec=5.714860750176948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:48,130] [INFO] [timer.py:197:stop] 0/4946, RunningAvgSamplesPerSec=6.331006276714516, CurrSamplesPerSec=5.693992083341906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:44:59,676] [INFO] [timer.py:197:stop] 0/4948, RunningAvgSamplesPerSec=6.330950637601543, CurrSamplesPerSec=5.472202700799134, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:45:11,183] [INFO] [timer.py:197:stop] 0/4950, RunningAvgSamplesPerSec=6.330955165485506, CurrSamplesPerSec=5.697722027285682, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0024, 'learning_rate': 5.624444444444445e-06, 'epoch': 10.49} [2022-12-17 04:45:22,615] [INFO] [timer.py:197:stop] 0/4952, RunningAvgSamplesPerSec=6.330968195653805, CurrSamplesPerSec=5.728223420030052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:45:33,954] [INFO] [timer.py:197:stop] 0/4954, RunningAvgSamplesPerSec=6.330962540471415, CurrSamplesPerSec=5.636947553409113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:45:45,419] [INFO] [timer.py:197:stop] 0/4956, RunningAvgSamplesPerSec=6.33096635642239, CurrSamplesPerSec=5.693265324059259, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:45:56,717] [INFO] [timer.py:197:stop] 0/4958, RunningAvgSamplesPerSec=6.330967716644158, CurrSamplesPerSec=5.69654095017297, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:46:07,973] [INFO] [logging.py:68:log_dist] [Rank 0] step=2480, skipped=5, lr=[5.613333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 04:46:07,974] [INFO] [timer.py:197:stop] 0/4960, RunningAvgSamplesPerSec=6.330982785160217, CurrSamplesPerSec=5.729682559028319, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:46:19,241] [INFO] [timer.py:197:stop] 0/4962, RunningAvgSamplesPerSec=6.330991543986501, CurrSamplesPerSec=5.71776031937525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:46:30,562] [INFO] [timer.py:197:stop] 0/4964, RunningAvgSamplesPerSec=6.330986218687438, CurrSamplesPerSec=5.683666064780153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:46:41,955] [INFO] [timer.py:197:stop] 0/4966, RunningAvgSamplesPerSec=6.330967592057502, CurrSamplesPerSec=5.610417109930005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:46:53,258] [INFO] [timer.py:197:stop] 0/4968, RunningAvgSamplesPerSec=6.330971016468183, CurrSamplesPerSec=5.715656560645305, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:47:04,541] [INFO] [timer.py:197:stop] 0/4970, RunningAvgSamplesPerSec=6.3309794255496845, CurrSamplesPerSec=5.734612982910162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:47:16,033] [INFO] [timer.py:197:stop] 0/4972, RunningAvgSamplesPerSec=6.330936383650636, CurrSamplesPerSec=5.529107192673739, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:47:27,313] [INFO] [timer.py:197:stop] 0/4974, RunningAvgSamplesPerSec=6.330946328971878, CurrSamplesPerSec=5.729163081966281, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:47:38,607] [INFO] [timer.py:197:stop] 0/4976, RunningAvgSamplesPerSec=6.330953184900717, CurrSamplesPerSec=5.726244121402355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:47:50,207] [INFO] [timer.py:197:stop] 0/4978, RunningAvgSamplesPerSec=6.330953538099298, CurrSamplesPerSec=5.68242609069436, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:01,496] [INFO] [logging.py:68:log_dist] [Rank 0] step=2490, skipped=5, lr=[5.591111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 04:48:01,497] [INFO] [timer.py:197:stop] 0/4980, RunningAvgSamplesPerSec=6.33096013014356, CurrSamplesPerSec=5.694920305619834, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:12,985] [INFO] [timer.py:197:stop] 0/4982, RunningAvgSamplesPerSec=6.330969576658594, CurrSamplesPerSec=5.704752349116692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:24,480] [INFO] [timer.py:197:stop] 0/4984, RunningAvgSamplesPerSec=6.330971504946974, CurrSamplesPerSec=5.693244555334464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:35,802] [INFO] [timer.py:197:stop] 0/4986, RunningAvgSamplesPerSec=6.33097084853042, CurrSamplesPerSec=5.688880208346578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:47,248] [INFO] [timer.py:197:stop] 0/4988, RunningAvgSamplesPerSec=6.3309845144832435, CurrSamplesPerSec=5.742247609891225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:48:58,752] [INFO] [timer.py:197:stop] 0/4990, RunningAvgSamplesPerSec=6.330976160204492, CurrSamplesPerSec=5.674807318807295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:49:10,042] [INFO] [timer.py:197:stop] 0/4992, RunningAvgSamplesPerSec=6.3309835858007135, CurrSamplesPerSec=5.713671584841052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:49:21,342] [INFO] [timer.py:197:stop] 0/4994, RunningAvgSamplesPerSec=6.330989254123995, CurrSamplesPerSec=5.733567190966732, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:49:32,658] [INFO] [timer.py:197:stop] 0/4996, RunningAvgSamplesPerSec=6.330985781418266, CurrSamplesPerSec=5.712304461873893, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:49:43,973] [INFO] [timer.py:197:stop] 0/4998, RunningAvgSamplesPerSec=6.330986864519867, CurrSamplesPerSec=5.688001440544016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:49:55,252] [INFO] [logging.py:68:log_dist] [Rank 0] step=2500, skipped=5, lr=[5.56888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 04:49:55,254] [INFO] [timer.py:197:stop] 0/5000, RunningAvgSamplesPerSec=6.3309958963633886, CurrSamplesPerSec=5.718180282509813, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0025, 'learning_rate': 5.56888888888889e-06, 'epoch': 10.59} [2022-12-17 04:50:06,731] [INFO] [timer.py:197:stop] 0/5002, RunningAvgSamplesPerSec=6.3309565040950755, CurrSamplesPerSec=5.710061386113425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:50:18,001] [INFO] [timer.py:197:stop] 0/5004, RunningAvgSamplesPerSec=6.330958386668482, CurrSamplesPerSec=5.690579205791651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:50:29,295] [INFO] [timer.py:197:stop] 0/5006, RunningAvgSamplesPerSec=6.33096474195664, CurrSamplesPerSec=5.70624128047807, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:50:40,632] [INFO] [timer.py:197:stop] 0/5008, RunningAvgSamplesPerSec=6.330959864889557, CurrSamplesPerSec=5.72188784057362, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:50:51,923] [INFO] [timer.py:197:stop] 0/5010, RunningAvgSamplesPerSec=6.330966035087153, CurrSamplesPerSec=5.7113448027691005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:51:03,174] [INFO] [timer.py:197:stop] 0/5012, RunningAvgSamplesPerSec=6.330974127924187, CurrSamplesPerSec=5.714591636296829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:51:14,882] [INFO] [timer.py:197:stop] 0/5014, RunningAvgSamplesPerSec=6.330981396072004, CurrSamplesPerSec=5.719529258836262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:51:26,372] [INFO] [timer.py:197:stop] 0/5016, RunningAvgSamplesPerSec=6.330988349564209, CurrSamplesPerSec=5.709294332972175, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:51:37,680] [INFO] [timer.py:197:stop] 0/5018, RunningAvgSamplesPerSec=6.330990138475348, CurrSamplesPerSec=5.697592868472862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:51:49,248] [INFO] [logging.py:68:log_dist] [Rank 0] step=2510, skipped=5, lr=[5.546666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 04:51:49,250] [INFO] [timer.py:197:stop] 0/5020, RunningAvgSamplesPerSec=6.330983495949797, CurrSamplesPerSec=5.667921172586138, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:00,554] [INFO] [timer.py:197:stop] 0/5022, RunningAvgSamplesPerSec=6.330982696273461, CurrSamplesPerSec=5.700785823007439, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:12,061] [INFO] [timer.py:197:stop] 0/5024, RunningAvgSamplesPerSec=6.330933809736621, CurrSamplesPerSec=5.477729075145169, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:23,555] [INFO] [timer.py:197:stop] 0/5026, RunningAvgSamplesPerSec=6.330939921329972, CurrSamplesPerSec=5.714190932308468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:34,858] [INFO] [timer.py:197:stop] 0/5028, RunningAvgSamplesPerSec=6.330943473294494, CurrSamplesPerSec=5.708029315662384, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:46,444] [INFO] [timer.py:197:stop] 0/5030, RunningAvgSamplesPerSec=6.330879968150159, CurrSamplesPerSec=5.430234106849332, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:52:57,897] [INFO] [timer.py:197:stop] 0/5032, RunningAvgSamplesPerSec=6.330886515803439, CurrSamplesPerSec=5.708739938717838, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:53:09,136] [INFO] [timer.py:197:stop] 0/5034, RunningAvgSamplesPerSec=6.330906438784912, CurrSamplesPerSec=5.731909991688547, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:53:20,483] [INFO] [timer.py:197:stop] 0/5036, RunningAvgSamplesPerSec=6.33089898868953, CurrSamplesPerSec=5.645936337730132, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:53:32,121] [INFO] [timer.py:197:stop] 0/5038, RunningAvgSamplesPerSec=6.3309053412966305, CurrSamplesPerSec=5.706116101744382, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:53:43,376] [INFO] [logging.py:68:log_dist] [Rank 0] step=2520, skipped=5, lr=[5.524444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 04:53:43,378] [INFO] [timer.py:197:stop] 0/5040, RunningAvgSamplesPerSec=6.330912990471432, CurrSamplesPerSec=5.70689030866712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:53:54,672] [INFO] [timer.py:197:stop] 0/5042, RunningAvgSamplesPerSec=6.3309159228723315, CurrSamplesPerSec=5.6894517339217705, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:54:05,967] [INFO] [timer.py:197:stop] 0/5044, RunningAvgSamplesPerSec=6.3309220000445, CurrSamplesPerSec=5.7000314267222265, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:54:17,231] [INFO] [timer.py:197:stop] 0/5046, RunningAvgSamplesPerSec=6.330935750681338, CurrSamplesPerSec=5.738988914310895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:54:28,522] [INFO] [timer.py:197:stop] 0/5048, RunningAvgSamplesPerSec=6.330939183099069, CurrSamplesPerSec=5.689545311073165, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:54:39,793] [INFO] [timer.py:197:stop] 0/5050, RunningAvgSamplesPerSec=6.330947485759522, CurrSamplesPerSec=5.712155679007506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0021, 'learning_rate': 5.513333333333334e-06, 'epoch': 10.7} [2022-12-17 04:54:51,133] [INFO] [timer.py:197:stop] 0/5052, RunningAvgSamplesPerSec=6.33095712793302, CurrSamplesPerSec=5.7344730809313536, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:02,649] [INFO] [timer.py:197:stop] 0/5054, RunningAvgSamplesPerSec=6.330958595704715, CurrSamplesPerSec=5.691896602840322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:13,982] [INFO] [timer.py:197:stop] 0/5056, RunningAvgSamplesPerSec=6.330955353948697, CurrSamplesPerSec=5.671333757852513, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:25,520] [INFO] [timer.py:197:stop] 0/5058, RunningAvgSamplesPerSec=6.330959551487751, CurrSamplesPerSec=5.705458275887765, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:36,931] [INFO] [logging.py:68:log_dist] [Rank 0] step=2530, skipped=5, lr=[5.5022222222222224e-06], mom=[[0.9, 0.999]] [2022-12-17 04:55:36,933] [INFO] [timer.py:197:stop] 0/5060, RunningAvgSamplesPerSec=6.330948864322162, CurrSamplesPerSec=5.692745185578915, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:48,218] [INFO] [timer.py:197:stop] 0/5062, RunningAvgSamplesPerSec=6.330952978062427, CurrSamplesPerSec=5.707607201453538, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:55:59,759] [INFO] [timer.py:197:stop] 0/5064, RunningAvgSamplesPerSec=6.330951656585579, CurrSamplesPerSec=5.691243740564786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:56:11,357] [INFO] [timer.py:197:stop] 0/5066, RunningAvgSamplesPerSec=6.3309381307146335, CurrSamplesPerSec=5.699524088901511, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:56:22,657] [INFO] [timer.py:197:stop] 0/5068, RunningAvgSamplesPerSec=6.33094166677426, CurrSamplesPerSec=5.704085140967916, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:56:33,962] [INFO] [timer.py:197:stop] 0/5070, RunningAvgSamplesPerSec=6.330944843263456, CurrSamplesPerSec=5.68741405914248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:56:45,316] [INFO] [timer.py:197:stop] 0/5072, RunningAvgSamplesPerSec=6.330935756879842, CurrSamplesPerSec=5.718039719534393, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:56:56,621] [INFO] [timer.py:197:stop] 0/5074, RunningAvgSamplesPerSec=6.330939117650655, CurrSamplesPerSec=5.707029838678644, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:57:07,918] [INFO] [timer.py:197:stop] 0/5076, RunningAvgSamplesPerSec=6.330944165593699, CurrSamplesPerSec=5.705340164667865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:57:19,572] [INFO] [timer.py:197:stop] 0/5078, RunningAvgSamplesPerSec=6.3309475667119015, CurrSamplesPerSec=5.720259082428829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:57:30,872] [INFO] [logging.py:68:log_dist] [Rank 0] step=2540, skipped=5, lr=[5.480000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 04:57:30,874] [INFO] [timer.py:197:stop] 0/5080, RunningAvgSamplesPerSec=6.330950895937812, CurrSamplesPerSec=5.7041610183311, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:57:42,225] [INFO] [timer.py:197:stop] 0/5082, RunningAvgSamplesPerSec=6.3309426889380465, CurrSamplesPerSec=5.622238335501086, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:57:53,760] [INFO] [timer.py:197:stop] 0/5084, RunningAvgSamplesPerSec=6.330952854774201, CurrSamplesPerSec=5.729616273990773, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:58:05,143] [INFO] [timer.py:197:stop] 0/5086, RunningAvgSamplesPerSec=6.3309542655873265, CurrSamplesPerSec=5.703234143735746, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:58:16,445] [INFO] [timer.py:197:stop] 0/5088, RunningAvgSamplesPerSec=6.330960905784509, CurrSamplesPerSec=5.7160389692080775, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:58:28,076] [INFO] [timer.py:197:stop] 0/5090, RunningAvgSamplesPerSec=6.330968824088808, CurrSamplesPerSec=5.706404554792997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:58:39,364] [INFO] [timer.py:197:stop] 0/5092, RunningAvgSamplesPerSec=6.330972199325294, CurrSamplesPerSec=5.693679764246615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:58:50,726] [INFO] [timer.py:197:stop] 0/5094, RunningAvgSamplesPerSec=6.330961521446245, CurrSamplesPerSec=5.635776153192244, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:59:02,022] [INFO] [timer.py:197:stop] 0/5096, RunningAvgSamplesPerSec=6.33096897362408, CurrSamplesPerSec=5.719917061077151, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:59:13,270] [INFO] [timer.py:197:stop] 0/5098, RunningAvgSamplesPerSec=6.330985743984386, CurrSamplesPerSec=5.742667983797375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:59:24,542] [INFO] [logging.py:68:log_dist] [Rank 0] step=2550, skipped=5, lr=[5.4577777777777785e-06], mom=[[0.9, 0.999]] [2022-12-17 04:59:24,543] [INFO] [timer.py:197:stop] 0/5100, RunningAvgSamplesPerSec=6.330996104084737, CurrSamplesPerSec=5.716025093527925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0018, 'learning_rate': 5.4577777777777785e-06, 'epoch': 10.81} [2022-12-17 04:59:35,910] [INFO] [timer.py:197:stop] 0/5102, RunningAvgSamplesPerSec=6.330983921264206, CurrSamplesPerSec=5.626090613198088, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:59:47,213] [INFO] [timer.py:197:stop] 0/5104, RunningAvgSamplesPerSec=6.330988431641169, CurrSamplesPerSec=5.70301798095724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 04:59:58,451] [INFO] [timer.py:197:stop] 0/5106, RunningAvgSamplesPerSec=6.331007527523088, CurrSamplesPerSec=5.757828958869349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:00:09,982] [INFO] [timer.py:197:stop] 0/5108, RunningAvgSamplesPerSec=6.331019128346327, CurrSamplesPerSec=5.731908033391001, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:00:21,267] [INFO] [timer.py:197:stop] 0/5110, RunningAvgSamplesPerSec=6.331032777766586, CurrSamplesPerSec=5.727329034819482, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:00:32,810] [INFO] [timer.py:197:stop] 0/5112, RunningAvgSamplesPerSec=6.330977003613128, CurrSamplesPerSec=5.451910508189052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:00:44,368] [INFO] [timer.py:197:stop] 0/5114, RunningAvgSamplesPerSec=6.330978114522191, CurrSamplesPerSec=5.696982224919324, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:00:55,670] [INFO] [timer.py:197:stop] 0/5116, RunningAvgSamplesPerSec=6.330987497891552, CurrSamplesPerSec=5.707285135595816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:01:07,354] [INFO] [timer.py:197:stop] 0/5118, RunningAvgSamplesPerSec=6.330899081644487, CurrSamplesPerSec=5.3293482591493655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:01:18,753] [INFO] [logging.py:68:log_dist] [Rank 0] step=2560, skipped=5, lr=[5.435555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 05:01:18,755] [INFO] [timer.py:197:stop] 0/5120, RunningAvgSamplesPerSec=6.330901902893264, CurrSamplesPerSec=5.715834005486816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:01:30,224] [INFO] [timer.py:197:stop] 0/5122, RunningAvgSamplesPerSec=6.330903788047218, CurrSamplesPerSec=5.690690433368548, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:01:41,549] [INFO] [timer.py:197:stop] 0/5124, RunningAvgSamplesPerSec=6.330899001074299, CurrSamplesPerSec=5.678515045294561, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:01:53,137] [INFO] [timer.py:197:stop] 0/5126, RunningAvgSamplesPerSec=6.330897736958658, CurrSamplesPerSec=5.681188581320298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:02:04,450] [INFO] [timer.py:197:stop] 0/5128, RunningAvgSamplesPerSec=6.330906470721249, CurrSamplesPerSec=5.7209881168777335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:02:16,010] [INFO] [timer.py:197:stop] 0/5130, RunningAvgSamplesPerSec=6.330847665324243, CurrSamplesPerSec=5.448622762386459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:02:27,308] [INFO] [timer.py:197:stop] 0/5132, RunningAvgSamplesPerSec=6.330851968143579, CurrSamplesPerSec=5.695187811565611, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:02:38,633] [INFO] [timer.py:197:stop] 0/5134, RunningAvgSamplesPerSec=6.3308504728612265, CurrSamplesPerSec=5.6930532969919625, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:02:50,042] [INFO] [timer.py:197:stop] 0/5136, RunningAvgSamplesPerSec=6.330828568356297, CurrSamplesPerSec=5.590546942353913, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:01,351] [INFO] [timer.py:197:stop] 0/5138, RunningAvgSamplesPerSec=6.330831029018583, CurrSamplesPerSec=5.692884024759842, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:12,670] [INFO] [logging.py:68:log_dist] [Rank 0] step=2570, skipped=5, lr=[5.413333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 05:03:12,671] [INFO] [timer.py:197:stop] 0/5140, RunningAvgSamplesPerSec=6.330831830621784, CurrSamplesPerSec=5.692991961810583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:24,122] [INFO] [timer.py:197:stop] 0/5142, RunningAvgSamplesPerSec=6.3307995165787485, CurrSamplesPerSec=5.557602468839612, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:35,475] [INFO] [timer.py:197:stop] 0/5144, RunningAvgSamplesPerSec=6.330791148922355, CurrSamplesPerSec=5.66601202336115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:46,865] [INFO] [timer.py:197:stop] 0/5146, RunningAvgSamplesPerSec=6.330793328574028, CurrSamplesPerSec=5.694739566054518, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:03:58,525] [INFO] [timer.py:197:stop] 0/5148, RunningAvgSamplesPerSec=6.3307985999441545, CurrSamplesPerSec=5.71201687060952, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:04:09,847] [INFO] [timer.py:197:stop] 0/5150, RunningAvgSamplesPerSec=6.330798264943373, CurrSamplesPerSec=5.688799432348064, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0022, 'learning_rate': 5.402222222222223e-06, 'epoch': 10.91} [2022-12-17 05:04:21,267] [INFO] [timer.py:197:stop] 0/5152, RunningAvgSamplesPerSec=6.3308049922936895, CurrSamplesPerSec=5.739957886590591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:04:32,804] [INFO] [timer.py:197:stop] 0/5154, RunningAvgSamplesPerSec=6.330808207297696, CurrSamplesPerSec=5.714203582698553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:04:44,095] [INFO] [timer.py:197:stop] 0/5156, RunningAvgSamplesPerSec=6.330814103852655, CurrSamplesPerSec=5.70229884876786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:04:55,418] [INFO] [timer.py:197:stop] 0/5158, RunningAvgSamplesPerSec=6.330826468927088, CurrSamplesPerSec=5.736627004540996, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:05:06,939] [INFO] [logging.py:68:log_dist] [Rank 0] step=2580, skipped=5, lr=[5.391111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 05:05:06,941] [INFO] [timer.py:197:stop] 0/5160, RunningAvgSamplesPerSec=6.3308266043403725, CurrSamplesPerSec=5.698408553854269, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:05:18,251] [INFO] [timer.py:197:stop] 0/5162, RunningAvgSamplesPerSec=6.330825475976958, CurrSamplesPerSec=5.6952141527152, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:05:29,525] [INFO] [timer.py:197:stop] 0/5164, RunningAvgSamplesPerSec=6.330832549112537, CurrSamplesPerSec=5.7298214935295935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:05:41,133] [INFO] [timer.py:197:stop] 0/5166, RunningAvgSamplesPerSec=6.330825425802116, CurrSamplesPerSec=5.6973268294471975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:05:52,434] [INFO] [timer.py:197:stop] 0/5168, RunningAvgSamplesPerSec=6.3308264370216225, CurrSamplesPerSec=5.683646328760508, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:06:03,725] [INFO] [timer.py:197:stop] 0/5170, RunningAvgSamplesPerSec=6.330833681391844, CurrSamplesPerSec=5.717644133973545, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:06:15,065] [INFO] [timer.py:197:stop] 0/5172, RunningAvgSamplesPerSec=6.330828764624116, CurrSamplesPerSec=5.7181651783578396, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:06:26,405] [INFO] [timer.py:197:stop] 0/5174, RunningAvgSamplesPerSec=6.330823795604273, CurrSamplesPerSec=5.675452815519628, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:06:37,674] [INFO] [timer.py:197:stop] 0/5176, RunningAvgSamplesPerSec=6.33083273222414, CurrSamplesPerSec=5.721885157322745, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:06:49,031] [INFO] [timer.py:197:stop] 0/5178, RunningAvgSamplesPerSec=6.330822949193593, CurrSamplesPerSec=5.683569311458349, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:00,323] [INFO] [logging.py:68:log_dist] [Rank 0] step=2590, skipped=5, lr=[5.368888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 05:07:00,326] [INFO] [timer.py:197:stop] 0/5180, RunningAvgSamplesPerSec=6.3308245978339865, CurrSamplesPerSec=5.687645430029632, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:11,607] [INFO] [timer.py:197:stop] 0/5182, RunningAvgSamplesPerSec=6.330829600063001, CurrSamplesPerSec=5.702640220108133, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:22,922] [INFO] [timer.py:197:stop] 0/5184, RunningAvgSamplesPerSec=6.3308299773962595, CurrSamplesPerSec=5.6947854747502085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:34,227] [INFO] [timer.py:197:stop] 0/5186, RunningAvgSamplesPerSec=6.330840511623829, CurrSamplesPerSec=5.7253488887550406, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:45,694] [INFO] [timer.py:197:stop] 0/5188, RunningAvgSamplesPerSec=6.330802234493931, CurrSamplesPerSec=5.545931811238111, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:07:57,240] [INFO] [timer.py:197:stop] 0/5190, RunningAvgSamplesPerSec=6.3308106513252556, CurrSamplesPerSec=5.714275107111477, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:08:05,702] [INFO] [timer.py:197:stop] 0/5192, RunningAvgSamplesPerSec=6.331415568524979, CurrSamplesPerSec=10.224405036507427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:08:16,994] [INFO] [timer.py:197:stop] 0/5194, RunningAvgSamplesPerSec=6.33142151258471, CurrSamplesPerSec=5.708924967667167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:08:28,320] [INFO] [timer.py:197:stop] 0/5196, RunningAvgSamplesPerSec=6.331419035586852, CurrSamplesPerSec=5.720213493471337, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:08:39,593] [INFO] [timer.py:197:stop] 0/5198, RunningAvgSamplesPerSec=6.331429428749043, CurrSamplesPerSec=5.717296823800851, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:08:50,859] [INFO] [logging.py:68:log_dist] [Rank 0] step=2600, skipped=5, lr=[5.346666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 05:08:50,861] [INFO] [timer.py:197:stop] 0/5200, RunningAvgSamplesPerSec=6.33144061780039, CurrSamplesPerSec=5.7277724037809055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0016, 'learning_rate': 5.346666666666667e-06, 'epoch': 11.02} [2022-12-17 05:09:02,114] [INFO] [timer.py:197:stop] 0/5202, RunningAvgSamplesPerSec=6.331448483641698, CurrSamplesPerSec=5.719249225680997, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:09:13,434] [INFO] [timer.py:197:stop] 0/5204, RunningAvgSamplesPerSec=6.331448183327159, CurrSamplesPerSec=5.699337248682659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:09:24,734] [INFO] [timer.py:197:stop] 0/5206, RunningAvgSamplesPerSec=6.331452657655695, CurrSamplesPerSec=5.700206208089163, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:09:36,014] [INFO] [timer.py:197:stop] 0/5208, RunningAvgSamplesPerSec=6.331461622033795, CurrSamplesPerSec=5.72109761000533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:09:47,276] [INFO] [timer.py:197:stop] 0/5210, RunningAvgSamplesPerSec=6.3314746264100314, CurrSamplesPerSec=5.716316496958784, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:09:58,571] [INFO] [timer.py:197:stop] 0/5212, RunningAvgSamplesPerSec=6.331480978826958, CurrSamplesPerSec=5.725256083923386, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:10:09,843] [INFO] [timer.py:197:stop] 0/5214, RunningAvgSamplesPerSec=6.331491432789969, CurrSamplesPerSec=5.725326664166113, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:10:21,125] [INFO] [timer.py:197:stop] 0/5216, RunningAvgSamplesPerSec=6.331499899781304, CurrSamplesPerSec=5.7110269321049865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:10:32,392] [INFO] [timer.py:197:stop] 0/5218, RunningAvgSamplesPerSec=6.331511769858337, CurrSamplesPerSec=5.712665997638112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:10:43,678] [INFO] [logging.py:68:log_dist] [Rank 0] step=2610, skipped=5, lr=[5.324444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 05:10:43,680] [INFO] [timer.py:197:stop] 0/5220, RunningAvgSamplesPerSec=6.331518616398237, CurrSamplesPerSec=5.7057262876552635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:10:54,977] [INFO] [timer.py:197:stop] 0/5222, RunningAvgSamplesPerSec=6.331523063457298, CurrSamplesPerSec=5.700985107994975, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:11:06,265] [INFO] [timer.py:197:stop] 0/5224, RunningAvgSamplesPerSec=6.331530146740582, CurrSamplesPerSec=5.712085180028847, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:11:17,572] [INFO] [timer.py:197:stop] 0/5226, RunningAvgSamplesPerSec=6.331532193459119, CurrSamplesPerSec=5.693007416219925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:11:28,880] [INFO] [timer.py:197:stop] 0/5228, RunningAvgSamplesPerSec=6.331534394982881, CurrSamplesPerSec=5.702848600275808, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:11:40,160] [INFO] [timer.py:197:stop] 0/5230, RunningAvgSamplesPerSec=6.331543863225301, CurrSamplesPerSec=5.727150386796147, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:11:51,450] [INFO] [timer.py:197:stop] 0/5232, RunningAvgSamplesPerSec=6.3315500633562944, CurrSamplesPerSec=5.715176370807533, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:02,731] [INFO] [timer.py:197:stop] 0/5234, RunningAvgSamplesPerSec=6.331554300362814, CurrSamplesPerSec=5.718985546768476, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:14,015] [INFO] [timer.py:197:stop] 0/5236, RunningAvgSamplesPerSec=6.3315630531989795, CurrSamplesPerSec=5.7190143016978965, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:25,279] [INFO] [timer.py:197:stop] 0/5238, RunningAvgSamplesPerSec=6.331575216395316, CurrSamplesPerSec=5.733168474540448, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:36,556] [INFO] [logging.py:68:log_dist] [Rank 0] step=2620, skipped=5, lr=[5.302222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 05:12:36,557] [INFO] [timer.py:197:stop] 0/5240, RunningAvgSamplesPerSec=6.331583901131349, CurrSamplesPerSec=5.7215373317447575, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:47,841] [INFO] [timer.py:197:stop] 0/5242, RunningAvgSamplesPerSec=6.331591212199751, CurrSamplesPerSec=5.712049201910398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:12:59,137] [INFO] [timer.py:197:stop] 0/5244, RunningAvgSamplesPerSec=6.331595922910599, CurrSamplesPerSec=5.70338658227336, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:13:10,454] [INFO] [timer.py:197:stop] 0/5246, RunningAvgSamplesPerSec=6.3315953067483335, CurrSamplesPerSec=5.677981985152896, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:13:21,789] [INFO] [timer.py:197:stop] 0/5248, RunningAvgSamplesPerSec=6.331597185771025, CurrSamplesPerSec=5.705788625223031, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:13:33,159] [INFO] [timer.py:197:stop] 0/5250, RunningAvgSamplesPerSec=6.331600861122208, CurrSamplesPerSec=5.699541273008275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 5.2911111111111115e-06, 'epoch': 11.12} [2022-12-17 05:13:44,480] [INFO] [timer.py:197:stop] 0/5252, RunningAvgSamplesPerSec=6.331599661255974, CurrSamplesPerSec=5.68312072716045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:13:55,971] [INFO] [timer.py:197:stop] 0/5254, RunningAvgSamplesPerSec=6.331603344469879, CurrSamplesPerSec=5.695243394093243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:14:07,295] [INFO] [timer.py:197:stop] 0/5256, RunningAvgSamplesPerSec=6.331597969028604, CurrSamplesPerSec=5.66516708705912, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:14:18,576] [INFO] [timer.py:197:stop] 0/5258, RunningAvgSamplesPerSec=6.331607527232612, CurrSamplesPerSec=5.733064396149069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:14:29,890] [INFO] [logging.py:68:log_dist] [Rank 0] step=2630, skipped=5, lr=[5.28e-06], mom=[[0.9, 0.999]] [2022-12-17 05:14:29,893] [INFO] [timer.py:197:stop] 0/5260, RunningAvgSamplesPerSec=6.331607577304003, CurrSamplesPerSec=5.703569083004653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:14:41,225] [INFO] [timer.py:197:stop] 0/5262, RunningAvgSamplesPerSec=6.331622020508325, CurrSamplesPerSec=5.732581277009406, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:14:52,454] [INFO] [timer.py:197:stop] 0/5264, RunningAvgSamplesPerSec=6.331639133639575, CurrSamplesPerSec=5.738322751616373, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:15:03,721] [INFO] [timer.py:197:stop] 0/5266, RunningAvgSamplesPerSec=6.33165171193482, CurrSamplesPerSec=5.7073268783289315, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:15:15,023] [INFO] [timer.py:197:stop] 0/5268, RunningAvgSamplesPerSec=6.331655237231968, CurrSamplesPerSec=5.6875104613911, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:15:26,312] [INFO] [timer.py:197:stop] 0/5270, RunningAvgSamplesPerSec=6.331662476795234, CurrSamplesPerSec=5.703980176569691, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:15:37,605] [INFO] [timer.py:197:stop] 0/5272, RunningAvgSamplesPerSec=6.33166852698621, CurrSamplesPerSec=5.722610214448023, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:15:48,870] [INFO] [timer.py:197:stop] 0/5274, RunningAvgSamplesPerSec=6.33168179634596, CurrSamplesPerSec=5.734479941114413, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:00,126] [INFO] [timer.py:197:stop] 0/5276, RunningAvgSamplesPerSec=6.331695376311775, CurrSamplesPerSec=5.709654759127998, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:11,408] [INFO] [timer.py:197:stop] 0/5278, RunningAvgSamplesPerSec=6.331700742788226, CurrSamplesPerSec=5.71670313316583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:22,679] [INFO] [logging.py:68:log_dist] [Rank 0] step=2640, skipped=5, lr=[5.257777777777779e-06], mom=[[0.9, 0.999]] [2022-12-17 05:16:22,681] [INFO] [timer.py:197:stop] 0/5280, RunningAvgSamplesPerSec=6.331707264117074, CurrSamplesPerSec=5.704465032869232, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:34,130] [INFO] [timer.py:197:stop] 0/5282, RunningAvgSamplesPerSec=6.331710296952607, CurrSamplesPerSec=5.699808971206135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:45,425] [INFO] [timer.py:197:stop] 0/5284, RunningAvgSamplesPerSec=6.331715787203118, CurrSamplesPerSec=5.7361687804135455, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:16:56,757] [INFO] [timer.py:197:stop] 0/5286, RunningAvgSamplesPerSec=6.331716077735873, CurrSamplesPerSec=5.7076763763304585, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:17:08,057] [INFO] [timer.py:197:stop] 0/5288, RunningAvgSamplesPerSec=6.3317203145894725, CurrSamplesPerSec=5.714706237753005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:17:19,368] [INFO] [timer.py:197:stop] 0/5290, RunningAvgSamplesPerSec=6.331722296646484, CurrSamplesPerSec=5.699027246796303, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:17:30,729] [INFO] [timer.py:197:stop] 0/5292, RunningAvgSamplesPerSec=6.331713096218976, CurrSamplesPerSec=5.672534851579894, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:17:42,048] [INFO] [timer.py:197:stop] 0/5294, RunningAvgSamplesPerSec=6.331713210379387, CurrSamplesPerSec=5.700520696294161, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:17:53,394] [INFO] [timer.py:197:stop] 0/5296, RunningAvgSamplesPerSec=6.331707193585852, CurrSamplesPerSec=5.692714762593404, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:18:04,741] [INFO] [timer.py:197:stop] 0/5298, RunningAvgSamplesPerSec=6.3317006432874114, CurrSamplesPerSec=5.682712394124242, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:18:16,059] [INFO] [logging.py:68:log_dist] [Rank 0] step=2650, skipped=5, lr=[5.235555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 05:18:16,061] [INFO] [timer.py:197:stop] 0/5300, RunningAvgSamplesPerSec=6.331700367364679, CurrSamplesPerSec=5.707803322659478, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 5.235555555555556e-06, 'epoch': 11.23} [2022-12-17 05:18:27,358] [INFO] [timer.py:197:stop] 0/5302, RunningAvgSamplesPerSec=6.331705986411841, CurrSamplesPerSec=5.715463793550649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:18:38,619] [INFO] [timer.py:197:stop] 0/5304, RunningAvgSamplesPerSec=6.331712476197681, CurrSamplesPerSec=5.721632455156021, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:18:49,899] [INFO] [timer.py:197:stop] 0/5306, RunningAvgSamplesPerSec=6.331716192002087, CurrSamplesPerSec=5.710224879116211, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:01,197] [INFO] [timer.py:197:stop] 0/5308, RunningAvgSamplesPerSec=6.331718498716094, CurrSamplesPerSec=5.696512420788828, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:12,489] [INFO] [timer.py:197:stop] 0/5310, RunningAvgSamplesPerSec=6.331716879305524, CurrSamplesPerSec=5.6943266626871045, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:23,899] [INFO] [timer.py:197:stop] 0/5312, RunningAvgSamplesPerSec=6.331722781967347, CurrSamplesPerSec=5.709408964927718, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:35,201] [INFO] [timer.py:197:stop] 0/5314, RunningAvgSamplesPerSec=6.331726558596379, CurrSamplesPerSec=5.717023340647629, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:46,664] [INFO] [timer.py:197:stop] 0/5316, RunningAvgSamplesPerSec=6.33172464843963, CurrSamplesPerSec=5.688770015987062, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:19:57,951] [INFO] [timer.py:197:stop] 0/5318, RunningAvgSamplesPerSec=6.331731507360793, CurrSamplesPerSec=5.714302841549262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:20:09,198] [INFO] [logging.py:68:log_dist] [Rank 0] step=2660, skipped=5, lr=[5.213333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 05:20:09,200] [INFO] [timer.py:197:stop] 0/5320, RunningAvgSamplesPerSec=6.331747607613066, CurrSamplesPerSec=5.7378510105983604, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:20:20,482] [INFO] [timer.py:197:stop] 0/5322, RunningAvgSamplesPerSec=6.33175641041769, CurrSamplesPerSec=5.727818113304493, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:20:31,752] [INFO] [timer.py:197:stop] 0/5324, RunningAvgSamplesPerSec=6.331768182998982, CurrSamplesPerSec=5.730554190945366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:20:43,017] [INFO] [timer.py:197:stop] 0/5326, RunningAvgSamplesPerSec=6.331777148866358, CurrSamplesPerSec=5.71666222711335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:20:54,287] [INFO] [timer.py:197:stop] 0/5328, RunningAvgSamplesPerSec=6.331788167273932, CurrSamplesPerSec=5.723797489623256, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:21:05,551] [INFO] [timer.py:197:stop] 0/5330, RunningAvgSamplesPerSec=6.331800719530591, CurrSamplesPerSec=5.728999480872663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:21:16,893] [INFO] [timer.py:197:stop] 0/5332, RunningAvgSamplesPerSec=6.331788250487433, CurrSamplesPerSec=5.615644743162119, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:21:28,160] [INFO] [timer.py:197:stop] 0/5334, RunningAvgSamplesPerSec=6.331799635445216, CurrSamplesPerSec=5.722074454648978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:21:39,415] [INFO] [timer.py:197:stop] 0/5336, RunningAvgSamplesPerSec=6.331810093831878, CurrSamplesPerSec=5.7304957150562466, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:21:50,651] [INFO] [timer.py:197:stop] 0/5338, RunningAvgSamplesPerSec=6.331828149224771, CurrSamplesPerSec=5.744462205318481, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:01,961] [INFO] [logging.py:68:log_dist] [Rank 0] step=2670, skipped=5, lr=[5.1911111111111116e-06], mom=[[0.9, 0.999]] [2022-12-17 05:22:01,963] [INFO] [timer.py:197:stop] 0/5340, RunningAvgSamplesPerSec=6.331828700824646, CurrSamplesPerSec=5.703318723094472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:13,215] [INFO] [timer.py:197:stop] 0/5342, RunningAvgSamplesPerSec=6.331843623981074, CurrSamplesPerSec=5.747786195521225, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:24,454] [INFO] [timer.py:197:stop] 0/5344, RunningAvgSamplesPerSec=6.3318581877227516, CurrSamplesPerSec=5.73446230067685, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:35,701] [INFO] [timer.py:197:stop] 0/5346, RunningAvgSamplesPerSec=6.331876988632842, CurrSamplesPerSec=5.737213559784555, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:46,939] [INFO] [timer.py:197:stop] 0/5348, RunningAvgSamplesPerSec=6.331884598979178, CurrSamplesPerSec=5.723346681922929, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:22:58,261] [INFO] [timer.py:197:stop] 0/5350, RunningAvgSamplesPerSec=6.331906735436208, CurrSamplesPerSec=5.763013587580053, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 5.18e-06, 'epoch': 11.33} [2022-12-17 05:23:09,544] [INFO] [timer.py:197:stop] 0/5352, RunningAvgSamplesPerSec=6.331915096942259, CurrSamplesPerSec=5.732819765586832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:23:20,792] [INFO] [timer.py:197:stop] 0/5354, RunningAvgSamplesPerSec=6.331930152230148, CurrSamplesPerSec=5.7316884677646, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:23:32,086] [INFO] [timer.py:197:stop] 0/5356, RunningAvgSamplesPerSec=6.3319429083498004, CurrSamplesPerSec=5.744001253755843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:23:43,371] [INFO] [timer.py:197:stop] 0/5358, RunningAvgSamplesPerSec=6.331957376154482, CurrSamplesPerSec=5.740309183024957, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:23:54,647] [INFO] [logging.py:68:log_dist] [Rank 0] step=2680, skipped=5, lr=[5.168888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 05:23:54,648] [INFO] [timer.py:197:stop] 0/5360, RunningAvgSamplesPerSec=6.331973042164556, CurrSamplesPerSec=5.735553271893245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:24:05,930] [INFO] [timer.py:197:stop] 0/5362, RunningAvgSamplesPerSec=6.3319876983431085, CurrSamplesPerSec=5.732323467223491, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:24:17,235] [INFO] [timer.py:197:stop] 0/5364, RunningAvgSamplesPerSec=6.33199726527146, CurrSamplesPerSec=5.740857202934483, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:24:28,512] [INFO] [timer.py:197:stop] 0/5366, RunningAvgSamplesPerSec=6.3320090287277955, CurrSamplesPerSec=5.725859613384882, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:24:39,937] [INFO] [timer.py:197:stop] 0/5368, RunningAvgSamplesPerSec=6.332025183926404, CurrSamplesPerSec=5.734573290236572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:24:51,212] [INFO] [timer.py:197:stop] 0/5370, RunningAvgSamplesPerSec=6.3320424111755464, CurrSamplesPerSec=5.725142768419131, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:02,503] [INFO] [timer.py:197:stop] 0/5372, RunningAvgSamplesPerSec=6.332055348608328, CurrSamplesPerSec=5.726065296764906, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:13,783] [INFO] [timer.py:197:stop] 0/5374, RunningAvgSamplesPerSec=6.332067066686548, CurrSamplesPerSec=5.70496088367438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:25,093] [INFO] [timer.py:197:stop] 0/5376, RunningAvgSamplesPerSec=6.332075611655805, CurrSamplesPerSec=5.725530843801487, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:36,352] [INFO] [timer.py:197:stop] 0/5378, RunningAvgSamplesPerSec=6.332091996769401, CurrSamplesPerSec=5.747384268558034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:47,653] [INFO] [logging.py:68:log_dist] [Rank 0] step=2690, skipped=5, lr=[5.146666666666668e-06], mom=[[0.9, 0.999]] [2022-12-17 05:25:47,655] [INFO] [timer.py:197:stop] 0/5380, RunningAvgSamplesPerSec=6.332102105174284, CurrSamplesPerSec=5.742849321290859, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:25:58,906] [INFO] [timer.py:197:stop] 0/5382, RunningAvgSamplesPerSec=6.332119985619664, CurrSamplesPerSec=5.75074712347447, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:26:10,209] [INFO] [timer.py:197:stop] 0/5384, RunningAvgSamplesPerSec=6.332129202412391, CurrSamplesPerSec=5.7269793252631995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:26:21,476] [INFO] [timer.py:197:stop] 0/5386, RunningAvgSamplesPerSec=6.332142626128737, CurrSamplesPerSec=5.722102508797743, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:26:32,768] [INFO] [timer.py:197:stop] 0/5388, RunningAvgSamplesPerSec=6.332153973047589, CurrSamplesPerSec=5.720645764517236, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:26:44,052] [INFO] [timer.py:197:stop] 0/5390, RunningAvgSamplesPerSec=6.332168222200867, CurrSamplesPerSec=5.758247911655947, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:26:55,350] [INFO] [timer.py:197:stop] 0/5392, RunningAvgSamplesPerSec=6.332179117202303, CurrSamplesPerSec=5.721149797514245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:27:06,619] [INFO] [timer.py:197:stop] 0/5394, RunningAvgSamplesPerSec=6.332194122074683, CurrSamplesPerSec=5.742245153181966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:27:17,923] [INFO] [timer.py:197:stop] 0/5396, RunningAvgSamplesPerSec=6.332204587974897, CurrSamplesPerSec=5.7320757174664925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:27:29,221] [INFO] [timer.py:197:stop] 0/5398, RunningAvgSamplesPerSec=6.332216127969198, CurrSamplesPerSec=5.722261568824939, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:27:40,523] [INFO] [logging.py:68:log_dist] [Rank 0] step=2700, skipped=5, lr=[5.124444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 05:27:40,525] [INFO] [timer.py:197:stop] 0/5400, RunningAvgSamplesPerSec=6.332226431537867, CurrSamplesPerSec=5.727787558729274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 5.124444444444445e-06, 'epoch': 11.44} [2022-12-17 05:27:51,834] [INFO] [timer.py:197:stop] 0/5402, RunningAvgSamplesPerSec=6.3322330237590885, CurrSamplesPerSec=5.6972134076352186, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:03,079] [INFO] [timer.py:197:stop] 0/5404, RunningAvgSamplesPerSec=6.332250320317892, CurrSamplesPerSec=5.752729839098712, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:14,392] [INFO] [timer.py:197:stop] 0/5406, RunningAvgSamplesPerSec=6.332256759997134, CurrSamplesPerSec=5.721441235920432, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:25,656] [INFO] [timer.py:197:stop] 0/5408, RunningAvgSamplesPerSec=6.332269340859704, CurrSamplesPerSec=5.725855949325616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:36,913] [INFO] [timer.py:197:stop] 0/5410, RunningAvgSamplesPerSec=6.3322827126004, CurrSamplesPerSec=5.731097413139867, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:48,192] [INFO] [timer.py:197:stop] 0/5412, RunningAvgSamplesPerSec=6.332287783761879, CurrSamplesPerSec=5.724056974223865, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:28:59,486] [INFO] [timer.py:197:stop] 0/5414, RunningAvgSamplesPerSec=6.332293646201773, CurrSamplesPerSec=5.699748942659988, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:29:10,933] [INFO] [timer.py:197:stop] 0/5416, RunningAvgSamplesPerSec=6.3323149190795425, CurrSamplesPerSec=5.744511623716563, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:29:22,206] [INFO] [timer.py:197:stop] 0/5418, RunningAvgSamplesPerSec=6.332322147474618, CurrSamplesPerSec=5.681815807331453, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:29:33,427] [INFO] [logging.py:68:log_dist] [Rank 0] step=2710, skipped=5, lr=[5.102222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 05:29:33,429] [INFO] [timer.py:197:stop] 0/5420, RunningAvgSamplesPerSec=6.332336561497905, CurrSamplesPerSec=5.7329387724417895, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:29:44,696] [INFO] [timer.py:197:stop] 0/5422, RunningAvgSamplesPerSec=6.332344831258414, CurrSamplesPerSec=5.70700921207328, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:29:55,939] [INFO] [timer.py:197:stop] 0/5424, RunningAvgSamplesPerSec=6.332363229604519, CurrSamplesPerSec=5.73533685803425, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:30:07,205] [INFO] [timer.py:197:stop] 0/5426, RunningAvgSamplesPerSec=6.332375270831397, CurrSamplesPerSec=5.724526938018898, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:30:18,440] [INFO] [timer.py:197:stop] 0/5428, RunningAvgSamplesPerSec=6.332391038467731, CurrSamplesPerSec=5.740212455607973, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:30:29,715] [INFO] [timer.py:197:stop] 0/5430, RunningAvgSamplesPerSec=6.3324006160843105, CurrSamplesPerSec=5.7166308175061005, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:30:40,990] [INFO] [timer.py:197:stop] 0/5432, RunningAvgSamplesPerSec=6.332410610941344, CurrSamplesPerSec=5.717756665674356, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:30:52,266] [INFO] [timer.py:197:stop] 0/5434, RunningAvgSamplesPerSec=6.332420587194725, CurrSamplesPerSec=5.7336568363246085, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:03,530] [INFO] [timer.py:197:stop] 0/5436, RunningAvgSamplesPerSec=6.3324257712121925, CurrSamplesPerSec=5.696499365064274, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:14,767] [INFO] [timer.py:197:stop] 0/5438, RunningAvgSamplesPerSec=6.332444231804994, CurrSamplesPerSec=5.7353971483628605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:26,034] [INFO] [logging.py:68:log_dist] [Rank 0] step=2720, skipped=5, lr=[5.0800000000000005e-06], mom=[[0.9, 0.999]] [2022-12-17 05:31:26,036] [INFO] [timer.py:197:stop] 0/5440, RunningAvgSamplesPerSec=6.3324517740998765, CurrSamplesPerSec=5.721149797514245, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:37,312] [INFO] [timer.py:197:stop] 0/5442, RunningAvgSamplesPerSec=6.332461417845187, CurrSamplesPerSec=5.694489497472948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:48,559] [INFO] [timer.py:197:stop] 0/5444, RunningAvgSamplesPerSec=6.332477879576189, CurrSamplesPerSec=5.737828443549427, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:31:59,873] [INFO] [timer.py:197:stop] 0/5446, RunningAvgSamplesPerSec=6.332479712229146, CurrSamplesPerSec=5.698187434566532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:32:11,115] [INFO] [timer.py:197:stop] 0/5448, RunningAvgSamplesPerSec=6.332497359605725, CurrSamplesPerSec=5.72765410016634, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:32:22,609] [INFO] [timer.py:197:stop] 0/5450, RunningAvgSamplesPerSec=6.332510747841616, CurrSamplesPerSec=5.726563199659353, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.001, 'learning_rate': 5.06888888888889e-06, 'epoch': 11.55} [2022-12-17 05:32:33,869] [INFO] [timer.py:197:stop] 0/5452, RunningAvgSamplesPerSec=6.332524405237324, CurrSamplesPerSec=5.726556114074656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:32:45,116] [INFO] [timer.py:197:stop] 0/5454, RunningAvgSamplesPerSec=6.332536431214967, CurrSamplesPerSec=5.723373772312986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:32:56,506] [INFO] [timer.py:197:stop] 0/5456, RunningAvgSamplesPerSec=6.332548455978119, CurrSamplesPerSec=5.7462179403283145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:33:07,765] [INFO] [timer.py:197:stop] 0/5458, RunningAvgSamplesPerSec=6.332557599737585, CurrSamplesPerSec=5.7280026698697135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:33:19,023] [INFO] [logging.py:68:log_dist] [Rank 0] step=2730, skipped=5, lr=[5.057777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 05:33:19,025] [INFO] [timer.py:197:stop] 0/5460, RunningAvgSamplesPerSec=6.332571210787725, CurrSamplesPerSec=5.732027736787424, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:33:30,266] [INFO] [timer.py:197:stop] 0/5462, RunningAvgSamplesPerSec=6.332581482932561, CurrSamplesPerSec=5.711506667941586, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:33:41,533] [INFO] [timer.py:197:stop] 0/5464, RunningAvgSamplesPerSec=6.332593738459713, CurrSamplesPerSec=5.729059148948412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:33:52,804] [INFO] [timer.py:197:stop] 0/5466, RunningAvgSamplesPerSec=6.332604211069591, CurrSamplesPerSec=5.727068031611117, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:34:04,060] [INFO] [timer.py:197:stop] 0/5468, RunningAvgSamplesPerSec=6.3326183044704, CurrSamplesPerSec=5.736048413717722, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:34:15,314] [INFO] [timer.py:197:stop] 0/5470, RunningAvgSamplesPerSec=6.3326320905877935, CurrSamplesPerSec=5.743125526686731, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:34:26,583] [INFO] [timer.py:197:stop] 0/5472, RunningAvgSamplesPerSec=6.332643210929138, CurrSamplesPerSec=5.72707291909308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:34:37,832] [INFO] [timer.py:197:stop] 0/5474, RunningAvgSamplesPerSec=6.332658774956965, CurrSamplesPerSec=5.7351113932424616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:34:49,079] [INFO] [timer.py:197:stop] 0/5476, RunningAvgSamplesPerSec=6.332671219062701, CurrSamplesPerSec=5.729178977936781, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:00,334] [INFO] [timer.py:197:stop] 0/5478, RunningAvgSamplesPerSec=6.332685789743696, CurrSamplesPerSec=5.738412545745183, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:11,572] [INFO] [logging.py:68:log_dist] [Rank 0] step=2740, skipped=5, lr=[5.035555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 05:35:11,574] [INFO] [timer.py:197:stop] 0/5480, RunningAvgSamplesPerSec=6.332703090955735, CurrSamplesPerSec=5.726971749917317, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:22,856] [INFO] [timer.py:197:stop] 0/5482, RunningAvgSamplesPerSec=6.3327110287281245, CurrSamplesPerSec=5.710764253771904, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:34,087] [INFO] [timer.py:197:stop] 0/5484, RunningAvgSamplesPerSec=6.332726896523725, CurrSamplesPerSec=5.744172104506499, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:45,344] [INFO] [timer.py:197:stop] 0/5486, RunningAvgSamplesPerSec=6.332737779608204, CurrSamplesPerSec=5.7382054837343786, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:35:56,769] [INFO] [timer.py:197:stop] 0/5488, RunningAvgSamplesPerSec=6.332749506085508, CurrSamplesPerSec=5.738857387065375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:36:08,018] [INFO] [timer.py:197:stop] 0/5490, RunningAvgSamplesPerSec=6.332762552584889, CurrSamplesPerSec=5.727755782316844, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:36:19,293] [INFO] [timer.py:197:stop] 0/5492, RunningAvgSamplesPerSec=6.332772741523956, CurrSamplesPerSec=5.733351171866924, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:36:30,595] [INFO] [timer.py:197:stop] 0/5494, RunningAvgSamplesPerSec=6.332773836264852, CurrSamplesPerSec=5.676408612249814, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:36:41,894] [INFO] [timer.py:197:stop] 0/5496, RunningAvgSamplesPerSec=6.332779233746077, CurrSamplesPerSec=5.698465892839091, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:36:53,225] [INFO] [timer.py:197:stop] 0/5498, RunningAvgSamplesPerSec=6.332780467389881, CurrSamplesPerSec=5.677425009408602, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:37:04,548] [INFO] [logging.py:68:log_dist] [Rank 0] step=2750, skipped=5, lr=[5.013333333333333e-06], mom=[[0.9, 0.999]] [2022-12-17 05:37:04,550] [INFO] [timer.py:197:stop] 0/5500, RunningAvgSamplesPerSec=6.33278092368322, CurrSamplesPerSec=5.686835714271792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 5.013333333333333e-06, 'epoch': 11.65} [2022-12-17 05:37:15,905] [INFO] [timer.py:197:stop] 0/5502, RunningAvgSamplesPerSec=6.332783612977782, CurrSamplesPerSec=5.701471877168966, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:37:27,211] [INFO] [timer.py:197:stop] 0/5504, RunningAvgSamplesPerSec=6.332793342968272, CurrSamplesPerSec=5.712175856621704, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:37:38,561] [INFO] [timer.py:197:stop] 0/5506, RunningAvgSamplesPerSec=6.332789638900559, CurrSamplesPerSec=5.665064984738845, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:37:49,847] [INFO] [timer.py:197:stop] 0/5508, RunningAvgSamplesPerSec=6.332797026608971, CurrSamplesPerSec=5.710434299877502, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:01,243] [INFO] [timer.py:197:stop] 0/5510, RunningAvgSamplesPerSec=6.3328017604467, CurrSamplesPerSec=5.685818353795128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:12,604] [INFO] [timer.py:197:stop] 0/5512, RunningAvgSamplesPerSec=6.3328063775165155, CurrSamplesPerSec=5.723522652006832, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:23,903] [INFO] [timer.py:197:stop] 0/5514, RunningAvgSamplesPerSec=6.33281705103715, CurrSamplesPerSec=5.715605203494661, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:35,238] [INFO] [timer.py:197:stop] 0/5516, RunningAvgSamplesPerSec=6.332823395964448, CurrSamplesPerSec=5.708910398023532, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:46,529] [INFO] [timer.py:197:stop] 0/5518, RunningAvgSamplesPerSec=6.33282605944014, CurrSamplesPerSec=5.6991516306995935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:38:57,861] [INFO] [logging.py:68:log_dist] [Rank 0] step=2760, skipped=5, lr=[4.991111111111112e-06], mom=[[0.9, 0.999]] [2022-12-17 05:38:57,863] [INFO] [timer.py:197:stop] 0/5520, RunningAvgSamplesPerSec=6.33282949171402, CurrSamplesPerSec=5.710830589312635, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:39:09,216] [INFO] [timer.py:197:stop] 0/5522, RunningAvgSamplesPerSec=6.332837247559131, CurrSamplesPerSec=5.6983633125161095, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:39:20,443] [INFO] [timer.py:197:stop] 0/5524, RunningAvgSamplesPerSec=6.332858539457194, CurrSamplesPerSec=5.745685130183153, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:39:31,700] [INFO] [timer.py:197:stop] 0/5526, RunningAvgSamplesPerSec=6.332865922446157, CurrSamplesPerSec=5.710178478218908, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:39:42,975] [INFO] [timer.py:197:stop] 0/5528, RunningAvgSamplesPerSec=6.332877851007105, CurrSamplesPerSec=5.728147390091605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:39:54,237] [INFO] [timer.py:197:stop] 0/5530, RunningAvgSamplesPerSec=6.332890099141983, CurrSamplesPerSec=5.72886621035573, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:40:05,451] [INFO] [timer.py:197:stop] 0/5532, RunningAvgSamplesPerSec=6.332902675261394, CurrSamplesPerSec=5.730549542190849, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:40:16,803] [INFO] [timer.py:197:stop] 0/5534, RunningAvgSamplesPerSec=6.33290837551137, CurrSamplesPerSec=5.707021102686406, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:40:28,091] [INFO] [timer.py:197:stop] 0/5536, RunningAvgSamplesPerSec=6.332914417331108, CurrSamplesPerSec=5.70401605311191, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:40:39,392] [INFO] [timer.py:197:stop] 0/5538, RunningAvgSamplesPerSec=6.33291884533572, CurrSamplesPerSec=5.687822104017034, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:40:50,946] [INFO] [logging.py:68:log_dist] [Rank 0] step=2770, skipped=5, lr=[4.968888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 05:40:50,948] [INFO] [timer.py:197:stop] 0/5540, RunningAvgSamplesPerSec=6.332925819119135, CurrSamplesPerSec=5.72190881880358, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:41:02,276] [INFO] [timer.py:197:stop] 0/5542, RunningAvgSamplesPerSec=6.3329356245064075, CurrSamplesPerSec=5.722274254974737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:41:13,559] [INFO] [timer.py:197:stop] 0/5544, RunningAvgSamplesPerSec=6.332943514845411, CurrSamplesPerSec=5.684454414219593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:41:24,945] [INFO] [timer.py:197:stop] 0/5546, RunningAvgSamplesPerSec=6.332950889628434, CurrSamplesPerSec=5.713319163448986, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:41:36,237] [INFO] [timer.py:197:stop] 0/5548, RunningAvgSamplesPerSec=6.332957682383578, CurrSamplesPerSec=5.703936058737852, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:41:47,562] [INFO] [timer.py:197:stop] 0/5550, RunningAvgSamplesPerSec=6.332956047074123, CurrSamplesPerSec=5.685354242281351, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 4.957777777777778e-06, 'epoch': 11.76} [2022-12-17 05:41:58,980] [INFO] [timer.py:197:stop] 0/5552, RunningAvgSamplesPerSec=6.332966331820771, CurrSamplesPerSec=5.733141536241398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:42:10,228] [INFO] [timer.py:197:stop] 0/5554, RunningAvgSamplesPerSec=6.332978090415024, CurrSamplesPerSec=5.718462160113856, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:42:21,520] [INFO] [timer.py:197:stop] 0/5556, RunningAvgSamplesPerSec=6.332983898220038, CurrSamplesPerSec=5.697712836015588, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:42:32,801] [INFO] [timer.py:197:stop] 0/5558, RunningAvgSamplesPerSec=6.332992313775625, CurrSamplesPerSec=5.715544841767603, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:42:44,184] [INFO] [logging.py:68:log_dist] [Rank 0] step=2780, skipped=5, lr=[4.946666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 05:42:44,186] [INFO] [timer.py:197:stop] 0/5560, RunningAvgSamplesPerSec=6.332978351255422, CurrSamplesPerSec=5.627949114964652, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:42:55,885] [INFO] [timer.py:197:stop] 0/5562, RunningAvgSamplesPerSec=6.332894527720718, CurrSamplesPerSec=5.328057955897367, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:43:07,213] [INFO] [timer.py:197:stop] 0/5564, RunningAvgSamplesPerSec=6.332891784684995, CurrSamplesPerSec=5.661491093777932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:43:18,518] [INFO] [timer.py:197:stop] 0/5566, RunningAvgSamplesPerSec=6.332894393890626, CurrSamplesPerSec=5.710761823927876, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:43:29,902] [INFO] [timer.py:197:stop] 0/5568, RunningAvgSamplesPerSec=6.332878696942575, CurrSamplesPerSec=5.613556043012562, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:43:41,167] [INFO] [timer.py:197:stop] 0/5570, RunningAvgSamplesPerSec=6.332889244247719, CurrSamplesPerSec=5.7199923851472185, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:43:52,430] [INFO] [timer.py:197:stop] 0/5572, RunningAvgSamplesPerSec=6.332901017021467, CurrSamplesPerSec=5.740034966821891, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:44:03,768] [INFO] [timer.py:197:stop] 0/5574, RunningAvgSamplesPerSec=6.332896168709029, CurrSamplesPerSec=5.663117615475737, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:44:15,088] [INFO] [timer.py:197:stop] 0/5576, RunningAvgSamplesPerSec=6.332895772682674, CurrSamplesPerSec=5.692906722591734, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:44:26,629] [INFO] [timer.py:197:stop] 0/5578, RunningAvgSamplesPerSec=6.332905969270301, CurrSamplesPerSec=5.718397839872948, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:44:38,107] [INFO] [logging.py:68:log_dist] [Rank 0] step=2790, skipped=5, lr=[4.924444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 05:44:38,108] [INFO] [timer.py:197:stop] 0/5580, RunningAvgSamplesPerSec=6.332897487884689, CurrSamplesPerSec=5.719037939441172, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:44:49,419] [INFO] [timer.py:197:stop] 0/5582, RunningAvgSamplesPerSec=6.332897838621812, CurrSamplesPerSec=5.70153751260587, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:00,687] [INFO] [timer.py:197:stop] 0/5584, RunningAvgSamplesPerSec=6.3329080849694765, CurrSamplesPerSec=5.710074018225663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:12,195] [INFO] [timer.py:197:stop] 0/5586, RunningAvgSamplesPerSec=6.332903922957749, CurrSamplesPerSec=5.720801817491398, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:23,531] [INFO] [timer.py:197:stop] 0/5588, RunningAvgSamplesPerSec=6.332901741022134, CurrSamplesPerSec=5.674606021061, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:34,856] [INFO] [timer.py:197:stop] 0/5590, RunningAvgSamplesPerSec=6.332897804579388, CurrSamplesPerSec=5.681754473526077, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:46,217] [INFO] [timer.py:197:stop] 0/5592, RunningAvgSamplesPerSec=6.33288074159132, CurrSamplesPerSec=5.6802431940249365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:45:57,527] [INFO] [timer.py:197:stop] 0/5594, RunningAvgSamplesPerSec=6.3328827302159905, CurrSamplesPerSec=5.692431554689089, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:46:08,808] [INFO] [timer.py:197:stop] 0/5596, RunningAvgSamplesPerSec=6.332891171460052, CurrSamplesPerSec=5.714777044771578, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:46:20,139] [INFO] [timer.py:197:stop] 0/5598, RunningAvgSamplesPerSec=6.332887723565751, CurrSamplesPerSec=5.698770752387616, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:46:31,417] [INFO] [logging.py:68:log_dist] [Rank 0] step=2800, skipped=5, lr=[4.902222222222222e-06], mom=[[0.9, 0.999]] [2022-12-17 05:46:31,419] [INFO] [timer.py:197:stop] 0/5600, RunningAvgSamplesPerSec=6.332889171708729, CurrSamplesPerSec=5.688668025550359, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0017, 'learning_rate': 4.902222222222222e-06, 'epoch': 11.86} [2022-12-17 05:46:42,751] [INFO] [timer.py:197:stop] 0/5602, RunningAvgSamplesPerSec=6.332892828513426, CurrSamplesPerSec=5.702749981251817, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:46:54,210] [INFO] [timer.py:197:stop] 0/5604, RunningAvgSamplesPerSec=6.332901936235582, CurrSamplesPerSec=5.735084681740569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:47:05,700] [INFO] [timer.py:197:stop] 0/5606, RunningAvgSamplesPerSec=6.33290293370556, CurrSamplesPerSec=5.691225158507459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:47:17,388] [INFO] [timer.py:197:stop] 0/5608, RunningAvgSamplesPerSec=6.332820121132463, CurrSamplesPerSec=5.3429333136735115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:47:28,766] [INFO] [timer.py:197:stop] 0/5610, RunningAvgSamplesPerSec=6.332821523370211, CurrSamplesPerSec=5.702558326024797, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:47:40,296] [INFO] [timer.py:197:stop] 0/5612, RunningAvgSamplesPerSec=6.332829560330361, CurrSamplesPerSec=5.71075842214971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:47:51,692] [INFO] [timer.py:197:stop] 0/5614, RunningAvgSamplesPerSec=6.332812496434912, CurrSamplesPerSec=5.608840873003154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:48:03,251] [INFO] [timer.py:197:stop] 0/5616, RunningAvgSamplesPerSec=6.332808657762249, CurrSamplesPerSec=5.6850399804175815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:48:14,569] [INFO] [timer.py:197:stop] 0/5618, RunningAvgSamplesPerSec=6.332807418728847, CurrSamplesPerSec=5.673635721564932, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:48:25,871] [INFO] [logging.py:68:log_dist] [Rank 0] step=2810, skipped=5, lr=[4.880000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 05:48:25,872] [INFO] [timer.py:197:stop] 0/5620, RunningAvgSamplesPerSec=6.332805826090633, CurrSamplesPerSec=5.68405455531197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:48:37,425] [INFO] [timer.py:197:stop] 0/5622, RunningAvgSamplesPerSec=6.332799703924759, CurrSamplesPerSec=5.670917772159438, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:48:48,976] [INFO] [timer.py:197:stop] 0/5624, RunningAvgSamplesPerSec=6.332805524442007, CurrSamplesPerSec=5.699043943930853, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:00,253] [INFO] [timer.py:197:stop] 0/5626, RunningAvgSamplesPerSec=6.332806477016322, CurrSamplesPerSec=5.683839843942882, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:11,510] [INFO] [timer.py:197:stop] 0/5628, RunningAvgSamplesPerSec=6.3328085207659495, CurrSamplesPerSec=5.6995364324043365, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:22,835] [INFO] [timer.py:197:stop] 0/5630, RunningAvgSamplesPerSec=6.332809979530304, CurrSamplesPerSec=5.70279868456956, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:34,163] [INFO] [timer.py:197:stop] 0/5632, RunningAvgSamplesPerSec=6.33280764518486, CurrSamplesPerSec=5.694752613713792, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:45,517] [INFO] [timer.py:197:stop] 0/5634, RunningAvgSamplesPerSec=6.332799728337538, CurrSamplesPerSec=5.660992264487115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:49:56,825] [INFO] [timer.py:197:stop] 0/5636, RunningAvgSamplesPerSec=6.332801426210607, CurrSamplesPerSec=5.695056109472525, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:50:08,283] [INFO] [timer.py:197:stop] 0/5638, RunningAvgSamplesPerSec=6.332765862833, CurrSamplesPerSec=5.530515181442866, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:50:19,615] [INFO] [logging.py:68:log_dist] [Rank 0] step=2820, skipped=5, lr=[4.857777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 05:50:19,616] [INFO] [timer.py:197:stop] 0/5640, RunningAvgSamplesPerSec=6.33276122207884, CurrSamplesPerSec=5.680091028575591, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:50:30,917] [INFO] [timer.py:197:stop] 0/5642, RunningAvgSamplesPerSec=6.332763728625373, CurrSamplesPerSec=5.711915503414802, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:50:42,205] [INFO] [timer.py:197:stop] 0/5644, RunningAvgSamplesPerSec=6.332769598022131, CurrSamplesPerSec=5.701877339420423, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:50:53,471] [INFO] [timer.py:197:stop] 0/5646, RunningAvgSamplesPerSec=6.332780589594118, CurrSamplesPerSec=5.726134919783692, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:51:04,789] [INFO] [timer.py:197:stop] 0/5648, RunningAvgSamplesPerSec=6.332779730197156, CurrSamplesPerSec=5.677553735791651, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:51:16,212] [INFO] [timer.py:197:stop] 0/5650, RunningAvgSamplesPerSec=6.332758376727921, CurrSamplesPerSec=5.638280024107815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 4.846666666666667e-06, 'epoch': 11.97} [2022-12-17 05:51:27,493] [INFO] [timer.py:197:stop] 0/5652, RunningAvgSamplesPerSec=6.332766430333524, CurrSamplesPerSec=5.727767270671415, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:51:38,883] [INFO] [timer.py:197:stop] 0/5654, RunningAvgSamplesPerSec=6.332771971674966, CurrSamplesPerSec=5.71815494655829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:51:50,324] [INFO] [timer.py:197:stop] 0/5656, RunningAvgSamplesPerSec=6.332751991785921, CurrSamplesPerSec=5.691306486070565, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:01,597] [INFO] [timer.py:197:stop] 0/5658, RunningAvgSamplesPerSec=6.332761303218689, CurrSamplesPerSec=5.726962708401693, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:12,901] [INFO] [logging.py:68:log_dist] [Rank 0] step=2830, skipped=5, lr=[4.835555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 05:52:12,902] [INFO] [timer.py:197:stop] 0/5660, RunningAvgSamplesPerSec=6.332763278259856, CurrSamplesPerSec=5.6826154326731615, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:24,257] [INFO] [timer.py:197:stop] 0/5662, RunningAvgSamplesPerSec=6.332754392749188, CurrSamplesPerSec=5.706142058898655, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:32,752] [INFO] [timer.py:197:stop] 0/5664, RunningAvgSamplesPerSec=6.333304594674231, CurrSamplesPerSec=10.171249085693656, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:44,051] [INFO] [timer.py:197:stop] 0/5666, RunningAvgSamplesPerSec=6.333307790781571, CurrSamplesPerSec=5.710589310286816, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:52:55,435] [INFO] [timer.py:197:stop] 0/5668, RunningAvgSamplesPerSec=6.333290583434095, CurrSamplesPerSec=5.666261032287638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:53:06,766] [INFO] [timer.py:197:stop] 0/5670, RunningAvgSamplesPerSec=6.33328694885843, CurrSamplesPerSec=5.692666714254806, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:53:18,101] [INFO] [timer.py:197:stop] 0/5672, RunningAvgSamplesPerSec=6.333282186189515, CurrSamplesPerSec=5.670440039003418, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:53:29,691] [INFO] [timer.py:197:stop] 0/5674, RunningAvgSamplesPerSec=6.333221424987277, CurrSamplesPerSec=5.713069892305831, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:53:41,004] [INFO] [timer.py:197:stop] 0/5676, RunningAvgSamplesPerSec=6.333218738932709, CurrSamplesPerSec=5.699126221095192, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:53:52,433] [INFO] [timer.py:197:stop] 0/5678, RunningAvgSamplesPerSec=6.3332006328558315, CurrSamplesPerSec=5.607977987206557, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:54:03,976] [INFO] [logging.py:68:log_dist] [Rank 0] step=2840, skipped=5, lr=[4.8133333333333336e-06], mom=[[0.9, 0.999]] [2022-12-17 05:54:03,978] [INFO] [timer.py:197:stop] 0/5680, RunningAvgSamplesPerSec=6.333199732486365, CurrSamplesPerSec=5.711543125327429, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:54:15,293] [INFO] [timer.py:197:stop] 0/5682, RunningAvgSamplesPerSec=6.333195939860297, CurrSamplesPerSec=5.6958679262802505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:54:26,666] [INFO] [timer.py:197:stop] 0/5684, RunningAvgSamplesPerSec=6.33318316638135, CurrSamplesPerSec=5.636633885696688, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:54:37,975] [INFO] [timer.py:197:stop] 0/5686, RunningAvgSamplesPerSec=6.3331854938121515, CurrSamplesPerSec=5.702362565085231, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:54:49,305] [INFO] [timer.py:197:stop] 0/5688, RunningAvgSamplesPerSec=6.333182590483902, CurrSamplesPerSec=5.704248049485593, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:00,779] [INFO] [timer.py:197:stop] 0/5690, RunningAvgSamplesPerSec=6.33314737555965, CurrSamplesPerSec=5.53050811692653, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:12,113] [INFO] [timer.py:197:stop] 0/5692, RunningAvgSamplesPerSec=6.333142736354172, CurrSamplesPerSec=5.687230903873154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:23,408] [INFO] [timer.py:197:stop] 0/5694, RunningAvgSamplesPerSec=6.333146815189666, CurrSamplesPerSec=5.720327101529128, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:35,018] [INFO] [timer.py:197:stop] 0/5696, RunningAvgSamplesPerSec=6.333081745735081, CurrSamplesPerSec=5.42185939073298, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:46,331] [INFO] [timer.py:197:stop] 0/5698, RunningAvgSamplesPerSec=6.3330818798007895, CurrSamplesPerSec=5.698708810029422, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:55:57,840] [INFO] [logging.py:68:log_dist] [Rank 0] step=2850, skipped=5, lr=[4.791111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 05:55:57,841] [INFO] [timer.py:197:stop] 0/5700, RunningAvgSamplesPerSec=6.333078349179951, CurrSamplesPerSec=5.692917347170979, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0008, 'learning_rate': 4.791111111111111e-06, 'epoch': 12.08} [2022-12-17 05:56:09,228] [INFO] [timer.py:197:stop] 0/5702, RunningAvgSamplesPerSec=6.333059736306311, CurrSamplesPerSec=5.597479915333243, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:56:20,516] [INFO] [timer.py:197:stop] 0/5704, RunningAvgSamplesPerSec=6.333065159279361, CurrSamplesPerSec=5.715673598766464, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:56:31,823] [INFO] [timer.py:197:stop] 0/5706, RunningAvgSamplesPerSec=6.333066730706159, CurrSamplesPerSec=5.701342790478605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:56:43,423] [INFO] [timer.py:197:stop] 0/5708, RunningAvgSamplesPerSec=6.333056047616498, CurrSamplesPerSec=5.715045689417052, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:56:54,704] [INFO] [timer.py:197:stop] 0/5710, RunningAvgSamplesPerSec=6.33306372097821, CurrSamplesPerSec=5.709811184638263, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:57:06,048] [INFO] [timer.py:197:stop] 0/5712, RunningAvgSamplesPerSec=6.333071282850042, CurrSamplesPerSec=5.705976131253787, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:57:17,368] [INFO] [timer.py:197:stop] 0/5714, RunningAvgSamplesPerSec=6.33306681450364, CurrSamplesPerSec=5.652331112271843, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:57:28,654] [INFO] [timer.py:197:stop] 0/5716, RunningAvgSamplesPerSec=6.3330734652637615, CurrSamplesPerSec=5.707509874027401, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:57:40,257] [INFO] [timer.py:197:stop] 0/5718, RunningAvgSamplesPerSec=6.3330681573262, CurrSamplesPerSec=5.672987041388273, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:57:51,815] [INFO] [logging.py:68:log_dist] [Rank 0] step=2860, skipped=5, lr=[4.768888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 05:57:51,817] [INFO] [timer.py:197:stop] 0/5720, RunningAvgSamplesPerSec=6.333063883374338, CurrSamplesPerSec=5.687774138233812, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:03,114] [INFO] [timer.py:197:stop] 0/5722, RunningAvgSamplesPerSec=6.333068138037818, CurrSamplesPerSec=5.7160036715575195, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:14,417] [INFO] [timer.py:197:stop] 0/5724, RunningAvgSamplesPerSec=6.3330711810578535, CurrSamplesPerSec=5.703349017171229, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:25,784] [INFO] [timer.py:197:stop] 0/5726, RunningAvgSamplesPerSec=6.3330600285572825, CurrSamplesPerSec=5.726510424690953, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:37,095] [INFO] [timer.py:197:stop] 0/5728, RunningAvgSamplesPerSec=6.3330607107411065, CurrSamplesPerSec=5.6806035677858535, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:48,461] [INFO] [timer.py:197:stop] 0/5730, RunningAvgSamplesPerSec=6.333046162093266, CurrSamplesPerSec=5.624621524913426, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:58:59,989] [INFO] [timer.py:197:stop] 0/5732, RunningAvgSamplesPerSec=6.333047457967725, CurrSamplesPerSec=5.699362660168836, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:59:11,345] [INFO] [timer.py:197:stop] 0/5734, RunningAvgSamplesPerSec=6.333049646595997, CurrSamplesPerSec=5.7098857569589505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:59:22,670] [INFO] [timer.py:197:stop] 0/5736, RunningAvgSamplesPerSec=6.3330471630973975, CurrSamplesPerSec=5.656524632617846, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:59:34,092] [INFO] [timer.py:197:stop] 0/5738, RunningAvgSamplesPerSec=6.333058066751148, CurrSamplesPerSec=5.725679347229084, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:59:45,411] [INFO] [logging.py:68:log_dist] [Rank 0] step=2870, skipped=5, lr=[4.746666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 05:59:45,413] [INFO] [timer.py:197:stop] 0/5740, RunningAvgSamplesPerSec=6.333056508201623, CurrSamplesPerSec=5.693592330700572, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 05:59:56,699] [INFO] [timer.py:197:stop] 0/5742, RunningAvgSamplesPerSec=6.333056869621617, CurrSamplesPerSec=5.682889724245197, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:00:07,969] [INFO] [timer.py:197:stop] 0/5744, RunningAvgSamplesPerSec=6.3330669801920125, CurrSamplesPerSec=5.732264465585629, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:00:19,237] [INFO] [timer.py:197:stop] 0/5746, RunningAvgSamplesPerSec=6.333077193576914, CurrSamplesPerSec=5.734636504753801, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:00:30,764] [INFO] [timer.py:197:stop] 0/5748, RunningAvgSamplesPerSec=6.333027780024121, CurrSamplesPerSec=5.457497362896864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:00:42,080] [INFO] [timer.py:197:stop] 0/5750, RunningAvgSamplesPerSec=6.333026353581669, CurrSamplesPerSec=5.670943410065879, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0007, 'learning_rate': 4.735555555555556e-06, 'epoch': 12.18} [2022-12-17 06:00:53,386] [INFO] [timer.py:197:stop] 0/5752, RunningAvgSamplesPerSec=6.333034645889236, CurrSamplesPerSec=5.7249857459432505, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:01:04,646] [INFO] [timer.py:197:stop] 0/5754, RunningAvgSamplesPerSec=6.333043290240301, CurrSamplesPerSec=5.723611739462003, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:01:15,937] [INFO] [timer.py:197:stop] 0/5756, RunningAvgSamplesPerSec=6.333048192210894, CurrSamplesPerSec=5.7072176689809595, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:01:27,234] [INFO] [timer.py:197:stop] 0/5758, RunningAvgSamplesPerSec=6.333051890604268, CurrSamplesPerSec=5.698058980366412, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:01:38,543] [INFO] [logging.py:68:log_dist] [Rank 0] step=2880, skipped=5, lr=[4.724444444444445e-06], mom=[[0.9, 0.999]] [2022-12-17 06:01:38,545] [INFO] [timer.py:197:stop] 0/5760, RunningAvgSamplesPerSec=6.333052965722199, CurrSamplesPerSec=5.686170522167649, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:01:49,806] [INFO] [timer.py:197:stop] 0/5762, RunningAvgSamplesPerSec=6.333061178432799, CurrSamplesPerSec=5.7263543044379785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:01,134] [INFO] [timer.py:197:stop] 0/5764, RunningAvgSamplesPerSec=6.333058222287947, CurrSamplesPerSec=5.685659145615068, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:12,459] [INFO] [timer.py:197:stop] 0/5766, RunningAvgSamplesPerSec=6.333056332696115, CurrSamplesPerSec=5.68130040403441, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:23,762] [INFO] [timer.py:197:stop] 0/5768, RunningAvgSamplesPerSec=6.33305904526347, CurrSamplesPerSec=5.6914582876511375, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:35,061] [INFO] [timer.py:197:stop] 0/5770, RunningAvgSamplesPerSec=6.333062133887484, CurrSamplesPerSec=5.714090461007016, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:46,746] [INFO] [timer.py:197:stop] 0/5772, RunningAvgSamplesPerSec=6.332981545034309, CurrSamplesPerSec=5.330093869140605, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:02:58,054] [INFO] [timer.py:197:stop] 0/5774, RunningAvgSamplesPerSec=6.332982922697146, CurrSamplesPerSec=5.684175638494878, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:03:09,529] [INFO] [timer.py:197:stop] 0/5776, RunningAvgSamplesPerSec=6.332982962694763, CurrSamplesPerSec=5.70664256871121, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:03:20,858] [INFO] [timer.py:197:stop] 0/5778, RunningAvgSamplesPerSec=6.332978596550738, CurrSamplesPerSec=5.674524450355167, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:03:32,169] [INFO] [logging.py:68:log_dist] [Rank 0] step=2890, skipped=5, lr=[4.7022222222222225e-06], mom=[[0.9, 0.999]] [2022-12-17 06:03:32,170] [INFO] [timer.py:197:stop] 0/5780, RunningAvgSamplesPerSec=6.332980005918211, CurrSamplesPerSec=5.709917092489247, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:03:43,748] [INFO] [timer.py:197:stop] 0/5782, RunningAvgSamplesPerSec=6.33297547290769, CurrSamplesPerSec=5.677242016650395, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:03:55,179] [INFO] [timer.py:197:stop] 0/5784, RunningAvgSamplesPerSec=6.332969718676225, CurrSamplesPerSec=5.670225157206506, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:04:06,469] [INFO] [timer.py:197:stop] 0/5786, RunningAvgSamplesPerSec=6.33297987413485, CurrSamplesPerSec=5.703800073451162, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:04:17,960] [INFO] [timer.py:197:stop] 0/5788, RunningAvgSamplesPerSec=6.332987330191376, CurrSamplesPerSec=5.696377756282663, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:04:29,355] [INFO] [timer.py:197:stop] 0/5790, RunningAvgSamplesPerSec=6.332994329889643, CurrSamplesPerSec=5.71702041843999, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:04:40,697] [INFO] [timer.py:197:stop] 0/5792, RunningAvgSamplesPerSec=6.3329883259810735, CurrSamplesPerSec=5.636968860406683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:04:51,944] [INFO] [timer.py:197:stop] 0/5794, RunningAvgSamplesPerSec=6.333002093160635, CurrSamplesPerSec=5.73800064443515, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:05:03,355] [INFO] [timer.py:197:stop] 0/5796, RunningAvgSamplesPerSec=6.332997662056815, CurrSamplesPerSec=5.709971991224141, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:05:14,590] [INFO] [timer.py:197:stop] 0/5798, RunningAvgSamplesPerSec=6.3330116405150365, CurrSamplesPerSec=5.737638347737378, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:05:26,187] [INFO] [logging.py:68:log_dist] [Rank 0] step=2900, skipped=5, lr=[4.680000000000001e-06], mom=[[0.9, 0.999]] [2022-12-17 06:05:26,189] [INFO] [timer.py:197:stop] 0/5800, RunningAvgSamplesPerSec=6.333014348129561, CurrSamplesPerSec=5.693146992655114, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0008, 'learning_rate': 4.680000000000001e-06, 'epoch': 12.29} [2022-12-17 06:05:37,429] [INFO] [timer.py:197:stop] 0/5802, RunningAvgSamplesPerSec=6.333029910069844, CurrSamplesPerSec=5.731119927273069, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:05:48,682] [INFO] [timer.py:197:stop] 0/5804, RunningAvgSamplesPerSec=6.333035842662751, CurrSamplesPerSec=5.704081019890971, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:00,097] [INFO] [timer.py:197:stop] 0/5806, RunningAvgSamplesPerSec=6.333042504696337, CurrSamplesPerSec=5.712226909053271, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:11,623] [INFO] [timer.py:197:stop] 0/5808, RunningAvgSamplesPerSec=6.333043987856465, CurrSamplesPerSec=5.68961477239248, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:22,901] [INFO] [timer.py:197:stop] 0/5810, RunningAvgSamplesPerSec=6.33304796554682, CurrSamplesPerSec=5.692458594636181, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:34,200] [INFO] [timer.py:197:stop] 0/5812, RunningAvgSamplesPerSec=6.333053093884182, CurrSamplesPerSec=5.704069383941154, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:45,832] [INFO] [timer.py:197:stop] 0/5814, RunningAvgSamplesPerSec=6.333036366855897, CurrSamplesPerSec=5.698952715852549, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:06:57,143] [INFO] [timer.py:197:stop] 0/5816, RunningAvgSamplesPerSec=6.333036955464624, CurrSamplesPerSec=5.704274232079576, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:07:08,666] [INFO] [timer.py:197:stop] 0/5818, RunningAvgSamplesPerSec=6.33303540294264, CurrSamplesPerSec=5.693990392427025, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:07:20,167] [INFO] [logging.py:68:log_dist] [Rank 0] step=2910, skipped=5, lr=[4.6577777777777785e-06], mom=[[0.9, 0.999]] [2022-12-17 06:07:20,169] [INFO] [timer.py:197:stop] 0/5820, RunningAvgSamplesPerSec=6.333035206874964, CurrSamplesPerSec=5.698076155639671, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:07:31,475] [INFO] [timer.py:197:stop] 0/5822, RunningAvgSamplesPerSec=6.333036966743757, CurrSamplesPerSec=5.705704457700638, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:07:42,769] [INFO] [timer.py:197:stop] 0/5824, RunningAvgSamplesPerSec=6.333041189439793, CurrSamplesPerSec=5.708153121779355, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:07:54,295] [INFO] [timer.py:197:stop] 0/5826, RunningAvgSamplesPerSec=6.333039135741505, CurrSamplesPerSec=5.678836995791065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:08:05,618] [INFO] [timer.py:197:stop] 0/5828, RunningAvgSamplesPerSec=6.333037473781379, CurrSamplesPerSec=5.682891167955098, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:08:16,973] [INFO] [timer.py:197:stop] 0/5830, RunningAvgSamplesPerSec=6.333024679570001, CurrSamplesPerSec=5.645582961104968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:08:28,288] [INFO] [timer.py:197:stop] 0/5832, RunningAvgSamplesPerSec=6.333023503984594, CurrSamplesPerSec=5.709940412176869, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:08:39,578] [INFO] [timer.py:197:stop] 0/5834, RunningAvgSamplesPerSec=6.333028469659971, CurrSamplesPerSec=5.7119590155108995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:08:50,859] [INFO] [timer.py:197:stop] 0/5836, RunningAvgSamplesPerSec=6.333035625576971, CurrSamplesPerSec=5.6891756027981115, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:02,335] [INFO] [timer.py:197:stop] 0/5838, RunningAvgSamplesPerSec=6.333039610848032, CurrSamplesPerSec=5.70028851881641, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:13,596] [INFO] [logging.py:68:log_dist] [Rank 0] step=2920, skipped=5, lr=[4.635555555555556e-06], mom=[[0.9, 0.999]] [2022-12-17 06:09:13,598] [INFO] [timer.py:197:stop] 0/5840, RunningAvgSamplesPerSec=6.333048189129644, CurrSamplesPerSec=5.726195505402264, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:24,852] [INFO] [timer.py:197:stop] 0/5842, RunningAvgSamplesPerSec=6.333057708895621, CurrSamplesPerSec=5.724060147746541, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:36,268] [INFO] [timer.py:197:stop] 0/5844, RunningAvgSamplesPerSec=6.333058364260927, CurrSamplesPerSec=5.700625049162459, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:47,565] [INFO] [timer.py:197:stop] 0/5846, RunningAvgSamplesPerSec=6.333062597156241, CurrSamplesPerSec=5.706827461725405, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:09:58,850] [INFO] [timer.py:197:stop] 0/5848, RunningAvgSamplesPerSec=6.3330712101826085, CurrSamplesPerSec=5.722144956467561, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:10:10,241] [INFO] [timer.py:197:stop] 0/5850, RunningAvgSamplesPerSec=6.333076512104131, CurrSamplesPerSec=5.716131962346995, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.001, 'learning_rate': 4.624444444444445e-06, 'epoch': 12.39} [2022-12-17 06:10:21,549] [INFO] [timer.py:197:stop] 0/5852, RunningAvgSamplesPerSec=6.333077908595145, CurrSamplesPerSec=5.686055135248809, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:10:32,841] [INFO] [timer.py:197:stop] 0/5854, RunningAvgSamplesPerSec=6.333082986621699, CurrSamplesPerSec=5.695888472429112, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:10:44,226] [INFO] [timer.py:197:stop] 0/5856, RunningAvgSamplesPerSec=6.333068188018257, CurrSamplesPerSec=5.685718877744978, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:10:55,591] [INFO] [timer.py:197:stop] 0/5858, RunningAvgSamplesPerSec=6.333058188138381, CurrSamplesPerSec=5.6309342973964815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:11:06,930] [INFO] [logging.py:68:log_dist] [Rank 0] step=2930, skipped=5, lr=[4.613333333333334e-06], mom=[[0.9, 0.999]] [2022-12-17 06:11:06,931] [INFO] [timer.py:197:stop] 0/5860, RunningAvgSamplesPerSec=6.333059374007877, CurrSamplesPerSec=5.7035758694456495, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:11:18,215] [INFO] [timer.py:197:stop] 0/5862, RunningAvgSamplesPerSec=6.333066234612839, CurrSamplesPerSec=5.7189429022784335, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:11:29,480] [INFO] [timer.py:197:stop] 0/5864, RunningAvgSamplesPerSec=6.333073388154316, CurrSamplesPerSec=5.700475421412082, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:11:40,883] [INFO] [timer.py:197:stop] 0/5866, RunningAvgSamplesPerSec=6.333075174763755, CurrSamplesPerSec=5.705023932007137, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:11:52,463] [INFO] [timer.py:197:stop] 0/5868, RunningAvgSamplesPerSec=6.33308124388736, CurrSamplesPerSec=5.7109671530431925, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:12:03,942] [INFO] [timer.py:197:stop] 0/5870, RunningAvgSamplesPerSec=6.3330868665001185, CurrSamplesPerSec=5.7183637312413875, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:12:15,485] [INFO] [timer.py:197:stop] 0/5872, RunningAvgSamplesPerSec=6.333047419957166, CurrSamplesPerSec=5.499836215038959, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:12:26,779] [INFO] [timer.py:197:stop] 0/5874, RunningAvgSamplesPerSec=6.333052248890246, CurrSamplesPerSec=5.716640313397472, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:12:38,045] [INFO] [timer.py:197:stop] 0/5876, RunningAvgSamplesPerSec=6.333063104667585, CurrSamplesPerSec=5.7382314883322785, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:12:49,591] [INFO] [timer.py:197:stop] 0/5878, RunningAvgSamplesPerSec=6.333007492183222, CurrSamplesPerSec=5.444904238732139, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:00,893] [INFO] [logging.py:68:log_dist] [Rank 0] step=2940, skipped=5, lr=[4.591111111111111e-06], mom=[[0.9, 0.999]] [2022-12-17 06:13:00,895] [INFO] [timer.py:197:stop] 0/5880, RunningAvgSamplesPerSec=6.333009913626235, CurrSamplesPerSec=5.695302844432262, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:12,177] [INFO] [timer.py:197:stop] 0/5882, RunningAvgSamplesPerSec=6.333018136040713, CurrSamplesPerSec=5.718970438362116, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:23,901] [INFO] [timer.py:197:stop] 0/5884, RunningAvgSamplesPerSec=6.333017939189659, CurrSamplesPerSec=5.704586259823864, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:35,218] [INFO] [timer.py:197:stop] 0/5886, RunningAvgSamplesPerSec=6.333018089438299, CurrSamplesPerSec=5.71238542051308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:46,725] [INFO] [timer.py:197:stop] 0/5888, RunningAvgSamplesPerSec=6.333024751875963, CurrSamplesPerSec=5.699916444826087, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:13:58,297] [INFO] [timer.py:197:stop] 0/5890, RunningAvgSamplesPerSec=6.333027057896281, CurrSamplesPerSec=5.704002478150968, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:14:09,601] [INFO] [timer.py:197:stop] 0/5892, RunningAvgSamplesPerSec=6.333029796735478, CurrSamplesPerSec=5.7104729302110275, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:14:20,923] [INFO] [timer.py:197:stop] 0/5894, RunningAvgSamplesPerSec=6.333037543211003, CurrSamplesPerSec=5.718080401695344, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:14:32,249] [INFO] [timer.py:197:stop] 0/5896, RunningAvgSamplesPerSec=6.333035956380963, CurrSamplesPerSec=5.708275476630145, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:14:43,527] [INFO] [timer.py:197:stop] 0/5898, RunningAvgSamplesPerSec=6.3330439846194055, CurrSamplesPerSec=5.728124165914754, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:14:54,832] [INFO] [logging.py:68:log_dist] [Rank 0] step=2950, skipped=5, lr=[4.568888888888889e-06], mom=[[0.9, 0.999]] [2022-12-17 06:14:54,834] [INFO] [timer.py:197:stop] 0/5900, RunningAvgSamplesPerSec=6.333046513333144, CurrSamplesPerSec=5.708106754545724, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 4.568888888888889e-06, 'epoch': 12.5} [2022-12-17 06:15:06,151] [INFO] [timer.py:197:stop] 0/5902, RunningAvgSamplesPerSec=6.333046281630883, CurrSamplesPerSec=5.713112206069512, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:15:17,464] [INFO] [timer.py:197:stop] 0/5904, RunningAvgSamplesPerSec=6.333052801023202, CurrSamplesPerSec=5.71561007143143, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:15:28,833] [INFO] [timer.py:197:stop] 0/5906, RunningAvgSamplesPerSec=6.333035290080761, CurrSamplesPerSec=5.608293863527618, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:15:40,572] [INFO] [timer.py:197:stop] 0/5908, RunningAvgSamplesPerSec=6.333040218050896, CurrSamplesPerSec=5.708095102165659, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:15:51,848] [INFO] [timer.py:197:stop] 0/5910, RunningAvgSamplesPerSec=6.333052727101973, CurrSamplesPerSec=5.736807960714468, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:16:03,374] [INFO] [timer.py:197:stop] 0/5912, RunningAvgSamplesPerSec=6.333008477793644, CurrSamplesPerSec=5.4947222221881065, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:16:14,794] [INFO] [timer.py:197:stop] 0/5914, RunningAvgSamplesPerSec=6.333011937522531, CurrSamplesPerSec=5.707201651961366, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:16:26,254] [INFO] [timer.py:197:stop] 0/5916, RunningAvgSamplesPerSec=6.333016941064426, CurrSamplesPerSec=5.697815392920308, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:16:37,755] [INFO] [timer.py:197:stop] 0/5918, RunningAvgSamplesPerSec=6.332988295785954, CurrSamplesPerSec=5.547189727597872, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:16:49,165] [INFO] [logging.py:68:log_dist] [Rank 0] step=2960, skipped=5, lr=[4.546666666666667e-06], mom=[[0.9, 0.999]] [2022-12-17 06:16:49,166] [INFO] [timer.py:197:stop] 0/5920, RunningAvgSamplesPerSec=6.332992632489854, CurrSamplesPerSec=5.715456978790322, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:00,543] [INFO] [timer.py:197:stop] 0/5922, RunningAvgSamplesPerSec=6.332990709493712, CurrSamplesPerSec=5.685131726842036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:11,794] [INFO] [timer.py:197:stop] 0/5924, RunningAvgSamplesPerSec=6.332998285720652, CurrSamplesPerSec=5.722485780413905, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:23,464] [INFO] [timer.py:197:stop] 0/5926, RunningAvgSamplesPerSec=6.332999240441912, CurrSamplesPerSec=5.710772029286689, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:34,840] [INFO] [timer.py:197:stop] 0/5928, RunningAvgSamplesPerSec=6.33299967952669, CurrSamplesPerSec=5.670417519968683, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:46,128] [INFO] [timer.py:197:stop] 0/5930, RunningAvgSamplesPerSec=6.333002653796711, CurrSamplesPerSec=5.716089116964696, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:17:57,453] [INFO] [timer.py:197:stop] 0/5932, RunningAvgSamplesPerSec=6.33300097953948, CurrSamplesPerSec=5.719599697949569, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:18:08,773] [INFO] [timer.py:197:stop] 0/5934, RunningAvgSamplesPerSec=6.333000297221696, CurrSamplesPerSec=5.683653789922553, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:18:20,073] [INFO] [timer.py:197:stop] 0/5936, RunningAvgSamplesPerSec=6.3330006554919525, CurrSamplesPerSec=5.685189521385519, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:18:31,402] [INFO] [timer.py:197:stop] 0/5938, RunningAvgSamplesPerSec=6.332996660401324, CurrSamplesPerSec=5.696287580558719, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:18:42,786] [INFO] [logging.py:68:log_dist] [Rank 0] step=2970, skipped=5, lr=[4.524444444444444e-06], mom=[[0.9, 0.999]] [2022-12-17 06:18:42,787] [INFO] [timer.py:197:stop] 0/5940, RunningAvgSamplesPerSec=6.332994887762245, CurrSamplesPerSec=5.695508030136036, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:18:54,075] [INFO] [timer.py:197:stop] 0/5942, RunningAvgSamplesPerSec=6.333001122764453, CurrSamplesPerSec=5.720114516738884, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:19:05,677] [INFO] [timer.py:197:stop] 0/5944, RunningAvgSamplesPerSec=6.333004765281942, CurrSamplesPerSec=5.715796276237501, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:19:17,231] [INFO] [timer.py:197:stop] 0/5946, RunningAvgSamplesPerSec=6.333007590785375, CurrSamplesPerSec=5.718903913586931, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:19:28,821] [INFO] [timer.py:197:stop] 0/5948, RunningAvgSamplesPerSec=6.332955003707633, CurrSamplesPerSec=5.437463012309892, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:19:40,183] [INFO] [timer.py:197:stop] 0/5950, RunningAvgSamplesPerSec=6.332957337549395, CurrSamplesPerSec=5.6944965039264135, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0012, 'learning_rate': 4.513333333333333e-06, 'epoch': 12.61} [2022-12-17 06:19:51,474] [INFO] [timer.py:197:stop] 0/5952, RunningAvgSamplesPerSec=6.332965662367068, CurrSamplesPerSec=5.721009088510862, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:02,822] [INFO] [timer.py:197:stop] 0/5954, RunningAvgSamplesPerSec=6.332961677367649, CurrSamplesPerSec=5.668551218034829, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:14,160] [INFO] [timer.py:197:stop] 0/5956, RunningAvgSamplesPerSec=6.332959969392417, CurrSamplesPerSec=5.677735067371684, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:25,505] [INFO] [timer.py:197:stop] 0/5958, RunningAvgSamplesPerSec=6.332957888895088, CurrSamplesPerSec=5.671679580428901, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:36,919] [INFO] [logging.py:68:log_dist] [Rank 0] step=2980, skipped=5, lr=[4.502222222222223e-06], mom=[[0.9, 0.999]] [2022-12-17 06:20:36,920] [INFO] [timer.py:197:stop] 0/5960, RunningAvgSamplesPerSec=6.3329590493220325, CurrSamplesPerSec=5.697653335352861, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:48,227] [INFO] [timer.py:197:stop] 0/5962, RunningAvgSamplesPerSec=6.332961638674421, CurrSamplesPerSec=5.70620222218292, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:20:59,636] [INFO] [timer.py:197:stop] 0/5964, RunningAvgSamplesPerSec=6.332966611609955, CurrSamplesPerSec=5.718364949399794, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:21:11,169] [INFO] [timer.py:197:stop] 0/5966, RunningAvgSamplesPerSec=6.332969463847246, CurrSamplesPerSec=5.718971413095601, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:21:22,408] [INFO] [timer.py:197:stop] 0/5968, RunningAvgSamplesPerSec=6.332982594177314, CurrSamplesPerSec=5.7341594900877935, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:21:33,667] [INFO] [timer.py:197:stop] 0/5970, RunningAvgSamplesPerSec=6.3329947216668545, CurrSamplesPerSec=5.718728958379074, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:21:45,273] [INFO] [timer.py:197:stop] 0/5972, RunningAvgSamplesPerSec=6.332985761956139, CurrSamplesPerSec=5.712217914034815, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:21:56,574] [INFO] [timer.py:197:stop] 0/5974, RunningAvgSamplesPerSec=6.332989670456562, CurrSamplesPerSec=5.711422331639339, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:22:08,107] [INFO] [timer.py:197:stop] 0/5976, RunningAvgSamplesPerSec=6.332995013267368, CurrSamplesPerSec=5.723771859777583, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:22:19,699] [INFO] [timer.py:197:stop] 0/5978, RunningAvgSamplesPerSec=6.332998115014715, CurrSamplesPerSec=5.734231759936096, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:22:31,011] [INFO] [logging.py:68:log_dist] [Rank 0] step=2990, skipped=5, lr=[4.48e-06], mom=[[0.9, 0.999]] [2022-12-17 06:22:31,012] [INFO] [timer.py:197:stop] 0/5980, RunningAvgSamplesPerSec=6.332999323998145, CurrSamplesPerSec=5.665226150268889, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:22:42,389] [INFO] [timer.py:197:stop] 0/5982, RunningAvgSamplesPerSec=6.333005629850955, CurrSamplesPerSec=5.700576867200941, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:22:53,952] [INFO] [timer.py:197:stop] 0/5984, RunningAvgSamplesPerSec=6.33301324936841, CurrSamplesPerSec=5.732907918338295, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:23:05,236] [INFO] [timer.py:197:stop] 0/5986, RunningAvgSamplesPerSec=6.333020150963738, CurrSamplesPerSec=5.709220747511372, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:23:16,563] [INFO] [timer.py:197:stop] 0/5988, RunningAvgSamplesPerSec=6.333027199070639, CurrSamplesPerSec=5.72047850468883, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:23:27,940] [INFO] [timer.py:197:stop] 0/5990, RunningAvgSamplesPerSec=6.333021411493446, CurrSamplesPerSec=5.706681147760475, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:23:39,270] [INFO] [timer.py:197:stop] 0/5992, RunningAvgSamplesPerSec=6.333012549087907, CurrSamplesPerSec=5.6532462764159055, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:23:50,584] [INFO] [timer.py:197:stop] 0/5994, RunningAvgSamplesPerSec=6.333012546323928, CurrSamplesPerSec=5.690703945040657, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:24:02,141] [INFO] [timer.py:197:stop] 0/5996, RunningAvgSamplesPerSec=6.3329965499097245, CurrSamplesPerSec=5.633968527653969, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:24:13,457] [INFO] [timer.py:197:stop] 0/5998, RunningAvgSamplesPerSec=6.332990861516932, CurrSamplesPerSec=5.682410453102639, MemAllocated=3.0GB, MaxMemAllocated=19.53GB [2022-12-17 06:24:24,871] [INFO] [logging.py:68:log_dist] [Rank 0] step=3000, skipped=5, lr=[4.457777777777778e-06], mom=[[0.9, 0.999]] [2022-12-17 06:24:24,873] [INFO] [timer.py:197:stop] 0/6000, RunningAvgSamplesPerSec=6.332989796557147, CurrSamplesPerSec=5.6977181572736075, MemAllocated=3.0GB, MaxMemAllocated=19.53GB {'loss': 0.0011, 'learning_rate': 4.457777777777778e-06, 'epoch': 12.71} {'eval_loss': 0.191650390625, 'eval_wer': 9.403141746929155, 'eval_runtime': 2117.549, 'eval_samples_per_second': 3.643, 'eval_steps_per_second': 0.456, 'epoch': 12.71} [2022-12-17 06:59:46,120] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step3000 is begin to save! [2022-12-17 06:59:46,130] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: ./checkpoint-3000/global_step3000/mp_rank_00_model_states.pt [2022-12-17 06:59:46,130] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-3000/global_step3000/mp_rank_00_model_states.pt... [2022-12-17 06:59:49,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-3000/global_step3000/mp_rank_00_model_states.pt. [2022-12-17 06:59:49,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving ./checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt... [2022-12-17 07:00:04,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved ./checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt. [2022-12-17 07:00:04,837] [INFO] [engine.py:3269:_save_zero_checkpoint] zero checkpoint saved ./checkpoint-3000/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2022-12-17 07:00:04,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now!